Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r095

D toc

Excerpt

Replaces found string literal or pattern or column with a string, column, or function returning strings.

Input can be specified as a column reference, a function returning a string, or a string literal, although string literal usage is rare.

  • A column reference can refer to a column of String type.
  • If no match is found, the function returns the source string.
  • If multiple matches are found in a single string, all replacements are made.

D s lang vs sql

D s
snippetBasic

Column reference example:

D lang syntax
RawWrangletrue
Typeref
showNotetrue
WrangleTextset col:myURL value:substitute(myURL,`{ip-address}`,myDomain)

substitute(myURL,`{ip-address}`,myDomain)

Output: Searches the myURL column values for sub-strings that match valid IP addresses. Where matches are found, they are replaced with the corresponding value in the myDomain column. 

Function reference example:

D lang syntax
RawWrangletrue
Typeref
showNotetrue
WrangleTextset col:companyName value:substitute(upper(companyName),'ACME','New ACME')

substitute(upper(companyName),'ACME','New ACME')

Output: Searches the uppercase version of values from the companyName column for the string literal ACME. When found, these matches are replaced by New ACME in the companyName column.

D s
snippetSyntax

D lang syntax
RawWrangletrue
Typesyntax
showNotetrue
WrangleTextderive type:single value:substitute(string_source,string_pattern,string_replacement,[pattern_before, pattern_after])

substitute(string_source,string_pattern,string_replacement[,ignore_case, pattern_before, pattern_after])


ArgumentRequired?Data TypeDescription
string_sourceYstringName of the column, a function returning a string, or string literal to be applied to the function
string_patternYstringString literal or pattern or a column or a function returning strings to find
string_replacementYstringString literal, column or function returning a string to use as replacement
ignore_caseNstringWhen true, matching is case-insensitive. Default is false.
pattern_beforeNstringString literal or pattern to find before finding the string_pattern value.
pattern_afterNstringString literal or pattern to find after finding the string_pattern value.

D s lang notes

string_source

Name of the item to be searched. Valid values can be:

  • String literals must be quoted ( 'Hello, World' ).
  • Column reference to any type that can be inferred as a string, which encompasses all values
  • Functions that return string values

Multiple values and wildcards are not supported.

D s
snippetusage

Required?Data TypeExample Value
YesString literal or column reference (String, Array, or Object)myColumn

string_pattern

String to find. This value can be a string literal, a 

D s lang
itempattern
rtrue
, a regular expression, a column, or a function returning a String value.

  • String literals must be quoted ('Hello, World').
  • Multiple values and wildcards are not supported.

D s
snippetusage

Required?Data TypeExample Value
YesString value or pattern or column reference (String)'Hello'

string_replacement

Value with which to replacement any matched patterns. Value can be a string, a function returning string values, or a column reference containing strings.

  • String literals must be quoted ('Hello, World').
  • Column reference to any type that can be inferred as a string, which encompasses all values. 
  • Multiple values and wildcards are not supported.

D s
snippetusage

 

Required?Data TypeExample Value
YesString literal, column reference (String, Array, or Object), or function returning String value'##REDACTED##'

ignore_case

When true, matches are case-insensitive. Default is false

Info

NOTE: This argument is not required. By default, matches are case-sensitive.


D s
snippetusage
 

Required?Data TypeExample Value
NoString value'false'

pattern_before

String literal or pattern to find in a position before the pattern to match. 

Info

NOTE: This argument is not permitted when string_ pattern or string_replacement is of column data type.

Tip

Tip: Use this argument if there are potentially multiple instances of the pattern to match in the source.

D s
snippetusage

 

Required?Data TypeExample Value
NoString literal or pattern`{digit}{3}`

pattern_after

String literal or pattern to find in a position after the pattern to match. 

Info

NOTE: This argument is not permitted when string_ pattern or string_replacement is of column data type.


Tip

Tip: Use this argument if there are potentially multiple instances of the pattern to match in the source.

D s
snippetusage

Required?Data TypeExample Value
NoString literal or pattern' '

D s
snippetExamples

Example - Partial obfuscation of credit card numbers

Source:

Suppose you have the following transactional data, which contains customer credit card numbers. 

TransactionIdCreditCardNumAmtDollars
T0014111-1111-1111-1111100.29
T0025500-0000-0000-0004510.21
T0033400-0000-0000-009162.13
T0043000-0000-0000-04294.12

For security purposes, you wish to redact the first three sets of digits, so only the last set of digits appears. 

Transformation:

To make the substitution, you must first change the type of the column to be a string:

D trans
RawWrangletrue
Typestep
WrangleTextsettype col: CreditCardNum type: 'String'
p01NameColumns
p01ValueCreditCardNum
p02NameNew type
p02Value'String'
SearchTermChange column data type

You can then use the following transformation to perform the pattern-based replacement of four-digit sets that end in a dash with XXXX:

D trans
RawWrangletrue
Typestep
WrangleTextset col: CreditCardNum value: substitute(CreditCardNum, `{digit}+\-`, 'XXXX-')
p01NameColumns
p01ValueCreditCardNum
p02NameFormula
p02Valuesubstitute(CreditCardNum, `{digit}+\-`, 'XXXX-')
SearchTermEdit column with formula

To indicate that the column no longer contains valid information, you might choose to rename it like in the following:

D trans
RawWrangletrue
p03Value'CreditCardNumOBSCURED'
Typestep
WrangleTextrename mapping: [CreditCardNum,'CreditCardNumOBSCURED']
p01NameOption
p01ValueManual rename
p02NameColumn
p02ValueCreditCardNum
p03NameNew column name
SearchTermRename columns

Results:

TransactionIdCreditCardNumOBSCUREDAmtDollars
T001XXXX-XXXX-XXXX-1111100.29
T002XXXX-XXXX-XXXX-0004510.21
T003XXXX-XXXX-XXXX-009162.13
T004XXXX-XXXX-XXXX-04294.12

D s also
labelstring