Excerpt |
---|
Replaces found string literal or pattern or column with a string, column, or function returning strings. |
Input can be specified as a column reference, a function returning a string, or a string literal, although string literal usage is rare.
- A column reference can refer to a column of String type.
- If no match is found, the function returns the source string.
- If multiple matches are found in a single string, all replacements are made.
Column reference example:
D lang syntax |
---|
RawWrangle | true |
---|
Type | ref |
---|
showNote | true |
---|
WrangleText | set col:myURL value:substitute(myURL,`{ip-address}`,myDomain) |
---|
|
substitute(myURL,`{ip-address}`,myDomain) |
Output: Searches the myURL
column values for sub-strings that match valid IP addresses. Where matches are found, they are replaced with the corresponding value in the myDomain
column.
Function reference example:
D lang syntax |
---|
RawWrangle | true |
---|
Type | ref |
---|
showNote | true |
---|
WrangleText | set col:companyName value:substitute(upper(companyName),'ACME','New ACME') |
---|
|
substitute(upper(companyName),'ACME','New ACME') |
Output: Searches the uppercase version of values from the companyName
column for the string literal ACME
. When found, these matches are replaced by New ACME
in the companyName
column.
D lang syntax |
---|
RawWrangle | true |
---|
Type | syntax |
---|
showNote | true |
---|
WrangleText | derive type:single value:substitute(string_source,string_pattern,string_replacement,[pattern_before, pattern_after]) |
---|
|
substitute(string_source,string_pattern,string_replacement[,ignore_case, pattern_before, pattern_after]) |
Argument | Required? | Data Type | Description |
---|
string_source | Y | string | Name of the column, a function returning a string, or string literal to be applied to the function |
string_pattern | Y | string | String literal or pattern or a column or a function returning strings to find |
string_replacement | Y | string | String literal, column or function returning a string to use as replacement |
ignore_case | N | string | When true , matching is case-insensitive. Default is false . |
pattern_before | N | string | String literal or pattern to find before finding the string_pattern value. |
pattern_after | N | string | String literal or pattern to find after finding the string_pattern value. |
string_source
Name of the item to be searched. Valid values can be:
- String literals must be quoted (
'Hello, World'
). - Column reference to any type that can be inferred as a string, which encompasses all values
- Functions that return string values
Multiple values and wildcards are not supported.
Required? | Data Type | Example Value |
---|
Yes | String literal or column reference (String, Array, or Object) | myColumn |
string_pattern
String to find. This value can be a string literal, a
, a regular expression, a column, or a function returning a String value.
- String literals must be quoted (
'Hello, World'
). - Multiple values and wildcards are not supported.
Required? | Data Type | Example Value |
---|
Yes | String value or pattern or column reference (String) | 'Hello' |
string_replacement
Value with which to replacement any matched patterns. Value can be a string, a function returning string values, or a column reference containing strings.
- String literals must be quoted (
'Hello, World'
). - Column reference to any type that can be inferred as a string, which encompasses all values.
- Multiple values and wildcards are not supported.
Required? | Data Type | Example Value |
---|
Yes | String literal, column reference (String, Array, or Object), or function returning String value | '##REDACTED##' |
ignore_case
When true
, matches are case-insensitive. Default is false
.
Info |
---|
NOTE: This argument is not required. By default, matches are case-sensitive. |
Required? | Data Type | Example Value |
---|
No | String value | 'false' |
pattern_before
String literal or pattern to find in a position before the pattern to match.
Info |
---|
NOTE: This argument is not permitted when string_ pattern or string_replacement is of column data type. |
Tip |
---|
Tip: Use this argument if there are potentially multiple instances of the pattern to match in the source. |
Required? | Data Type | Example Value |
---|
No | String literal or pattern | `{digit}{3}` |
pattern_after
String literal or pattern to find in a position after the pattern to match.
Info |
---|
NOTE: This argument is not permitted when string_ pattern or string_replacement is of column data type. |
Tip |
---|
Tip: Use this argument if there are potentially multiple instances of the pattern to match in the source. |
Required? | Data Type | Example Value |
---|
No | String literal or pattern | ' ' |
Example - Partial obfuscation of credit card numbers
Source:
Suppose you have the following transactional data, which contains customer credit card numbers.
TransactionId | CreditCardNum | AmtDollars |
---|
T001 | 4111-1111-1111-1111 | 100.29 |
T002 | 5500-0000-0000-0004 | 510.21 |
T003 | 3400-0000-0000-009 | 162.13 |
T004 | 3000-0000-0000-04 | 294.12 |
For security purposes, you wish to redact the first three sets of digits, so only the last set of digits appears.
Transformation:
To make the substitution, you must first change the type of the column to be a string:
D trans |
---|
RawWrangle | true |
---|
Type | step |
---|
WrangleText | settype col: CreditCardNum type: 'String' |
---|
p01Name | Columns |
---|
p01Value | CreditCardNum |
---|
p02Name | New type |
---|
p02Value | 'String' |
---|
SearchTerm | Change column data type |
---|
|
You can then use the following transformation to perform the pattern-based replacement of four-digit sets that end in a dash with XXXX
:
D trans |
---|
RawWrangle | true |
---|
Type | step |
---|
WrangleText | set col: CreditCardNum value: substitute(CreditCardNum, `{digit}+\-`, 'XXXX-') |
---|
p01Name | Columns |
---|
p01Value | CreditCardNum |
---|
p02Name | Formula |
---|
p02Value | substitute(CreditCardNum, `{digit}+\-`, 'XXXX-') |
---|
SearchTerm | Edit column with formula |
---|
|
To indicate that the column no longer contains valid information, you might choose to rename it like in the following:
D trans |
---|
RawWrangle | true |
---|
p03Value | 'CreditCardNumOBSCURED' |
---|
Type | step |
---|
WrangleText | rename mapping: [CreditCardNum,'CreditCardNumOBSCURED'] |
---|
p01Name | Option |
---|
p01Value | Manual rename |
---|
p02Name | Column |
---|
p02Value | CreditCardNum |
---|
p03Name | New column name |
---|
SearchTerm | Rename columns |
---|
|
Results:
TransactionId | CreditCardNumOBSCURED | AmtDollars |
---|
T001 | XXXX-XXXX-XXXX-1111 | 100.29 |
T002 | XXXX-XXXX-XXXX-0004 | 510.21 |
T003 | XXXX-XXXX-XXXX-009 | 162.13 |
T004 | XXXX-XXXX-XXXX-04 | 294.12 |