Page tree

The cloud-based product Trifacta Free is available now! Register for your free account.

 

Contents:


Replaces found string literal or pattern or column with a string, column, or function returning strings.

Input can be specified as a column reference, a function returning a string, or a string literal, although string literal usage is rare.

  • A column reference can refer to a column of String type.
  • If no match is found, the function returns the source string.
  • If multiple matches are found in a single string, all replacements are made.

Wrangle vs. SQL: This function is part of Wrangle , a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.

Basic Usage

Column reference example:

substitute(myURL,`{ip-address}`,myDomain)

Output: Searches the myURL column values for sub-strings that match valid IP addresses. Where matches are found, they are replaced with the corresponding value in the myDomain column. 

Function reference example:

substitute(upper(companyName),'ACME','New ACME')

Output: Searches the uppercase version of values from the companyName column for the string literal ACME. When found, these matches are replaced by New ACME in the companyName column.

Syntax and Arguments

substitute(string_source,string_pattern,string_replacement[,ignore_case, pattern_before, pattern_after])


ArgumentRequired?Data TypeDescription
string_sourceYstringName of the column, a function returning a string, or string literal to be applied to the function
string_patternYstringString literal or pattern or a column or a function returning strings to find
string_replacementYstringString literal, column or function returning a string to use as replacement
ignore_caseNstringWhen true, matching is case-insensitive. Default is false.
pattern_beforeNstringString literal or pattern to find before finding the string_pattern value.
pattern_afterNstringString literal or pattern to find after finding the string_pattern value.

For more information on syntax standards, see Language Documentation Syntax Notes.

string_source

Name of the item to be searched. Valid values can be:

  • String literals must be quoted ( 'Hello, World' ).
  • Column reference to any type that can be inferred as a string, which encompasses all values
  • Functions that return string values

Multiple values and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString literal or column reference (String, Array, or Object)myColumn

string_pattern

String to find. This value can be a string literal, a Trifacta® pattern, a regular expression, a column, or a function returning a String value.

  • String literals must be quoted ('Hello, World').
  • Multiple values and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString value or pattern or column reference (String)'Hello'

string_replacement

Value with which to replacement any matched patterns. Value can be a string, a function returning string values, or a column reference containing strings.

  • String literals must be quoted ('Hello, World').
  • Column reference to any type that can be inferred as a string, which encompasses all values. 
  • Multiple values and wildcards are not supported.

Usage Notes:

 

Required?Data TypeExample Value
YesString literal, column reference (String, Array, or Object), or function returning String value'##REDACTED##'

ignore_case

When true, matches are case-insensitive. Default is false

NOTE: This argument is not required. By default, matches are case-sensitive.


Usage Notes:

 

Required?Data TypeExample Value
NoString value'false'

pattern_before

String literal or pattern to find in a position before the pattern to match. 

NOTE: This argument is not permitted when string_ pattern or string_replacement is of column data type.

Tip: Use this argument if there are potentially multiple instances of the pattern to match in the source.

Usage Notes:

 

Required?Data TypeExample Value
NoString literal or pattern`{digit}{3}`

pattern_after

String literal or pattern to find in a position after the pattern to match. 

NOTE: This argument is not permitted when string_ pattern or string_replacement is of column data type.


Tip: Use this argument if there are potentially multiple instances of the pattern to match in the source.

Usage Notes:

Required?Data TypeExample Value
NoString literal or pattern' '

Examples

Tip: For additional examples, see Common Tasks.

Example - Partial obfuscation of credit card numbers

Source:

Suppose you have the following transactional data, which contains customer credit card numbers. 

TransactionIdCreditCardNumAmtDollars
T0014111-1111-1111-1111100.29
T0025500-0000-0000-0004510.21
T0033400-0000-0000-009162.13
T0043000-0000-0000-04294.12

For security purposes, you wish to redact the first three sets of digits, so only the last set of digits appears. 

Transformation:

To make the substitution, you must first change the type of the column to be a string:

Transformation Name Change column data type
Parameter: Columns CreditCardNum
Parameter: New type 'String'

You can then use the following transformation to perform the pattern-based replacement of four-digit sets that end in a dash with XXXX:

Transformation Name Edit column with formula
Parameter: Columns CreditCardNum
Parameter: Formula substitute(CreditCardNum, `{digit}+\-`, 'XXXX-')

To indicate that the column no longer contains valid information, you might choose to rename it like in the following:

Transformation Name Rename columns
Parameter: Option Manual rename
Parameter: Column CreditCardNum
Parameter: New column name 'CreditCardNumOBSCURED'

Results:

TransactionIdCreditCardNumOBSCUREDAmtDollars
T001XXXX-XXXX-XXXX-1111100.29
T002XXXX-XXXX-XXXX-0004510.21
T003XXXX-XXXX-XXXX-009162.13
T004XXXX-XXXX-XXXX-04294.12

This page has no comments.