Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

D toc

D s transforms

Excerpt

Extracts a set of values based on a specified pattern from a source column of any data type. The generated column contains an array of occurrences of the specified pattern.

While the new column contains array data, the data type of the new column is sometimes inferred as String. 

D s
snippetBasic

Your source column (myWidgetInventory) is formatted in the following manner:

Code Block
{ "red":"100","white":"1300","blue":"315","purple":"55"}

The following step extracts the raw inventory contents of each color:

D code
extractlist col: myWidgetInventory on:`{digit}+`

Output: The generated column contains data that looks like the following array:

Code Block
["100","1300","315","55"]

D s
snippetParameters

D code

extractlist: col:column_ref on:string_literal_pattern delimiter:string_literal_pattern [quote:'quoted_string'] [as:'new_column_name']

TokenRequired?Data TypeDescription
extractlistYtransformName of the transform
colYstringSource column name
onYstringString literal or pattern that identifies the values to extract from the source column
delimiterYstringString literal or pattern that identifies the separator between the values to extract
quoteNstringSpecifies a quoted object that is omitted from matching delimiters
asNstringName of the newly generated column

D s lang notes

col

Identifies the column to which to apply the transform. You can specify only one column.

D s
snippetusage

Required?Data Type
YesString (column name)

on

Include Page
on Parameter
on Parameter

Tip

Tip: You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/. For example, /\u0013/ represents Unicode character 0013 (carriage return). For more information, see Supported Special Regular Expression Characters.

For the extractlist tranform, all instances that match this pattern in the source column are extracted into the array list in the new column. Each occurrence in the generated array corresponds to an individual instance in the source; the new column can contain duplicate values.

To create array elements based only on the delimiter parameter, set the following regular expression:

Code Block
on:`/+/`

D s
snippetusage

Required?Data Type
Yes

String (literal, regular expression, or

D s item
itempattern
rtrue
)

delimiter

Include Page
delimiter Parameter
delimiter Parameter

For this transform, this parameter defines the pattern that separates the values that you want to extract into the array.

Tip

Tip: You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/. For example, /\u0013/ represents Unicode character 0013 (carriage return). For more information, see Supported Special Regular Expression Characters.

D s
snippetusage

Required?Data Type
Yes

String (literal, regular expression, or

D s item
pattern
pattern
)

quote

D code

extractlist col: MySourceValues on:`{alpha}+` delimiter:';' quote:'\"'

Output: Extracts from the MySourceValues column each instance of a string value that occurs before the delimiter. Values between double-quotes are considered string literals and are not processed according to the delimiters defined in the transform. 

Include Page
quote Parameter
quote Parameter

D s
snippetusage

Required?Data Type
NoString
  • The quote value can appear anywhere in the column value. It is not limited by the constraints of any other parameters.

as

Include Page
as Parameter
as Parameter

D s
snippetusage

Required?Data Type
NoString (column name)


D s
snippetExamples

Example - extract query parameters from URLs

Source:

In this example, a list of URLs identifies the items in the shopping carts of visitors to your web site. You want to extract the shopping cart information embedded in the query parameters of the URL.

UsernamecartURL
joe.robinsonhttp://example123.com/cart.asp?prodid=1001&qty=2
steph.schmidthttp://example123.com/cart.asp?prodid=1005&qty=4
jack.holmeshttp://example123.com/cart.asp?prodid=2102&qty=1
tina.joneshttp://example123.com/cart.asp?prodid=10412&qty=2

Transformation:

The following transformation extracts the list of query values from the URL. Note that the equals sign is included in the matching pattern so that you don't accidentally pick up numeric values from the non-parameter part of the URL:

D trans
RawWrangletrue
Typestep
WrangleTextextractlist col:cartURL on:`=[digit]+`
p01NameColumn
p01ValuecartURL
p02NamePattern matching elements in list
p02Value`=[digit]+`
SearchTermExtract matches into Array

The two query parameter values have been extracted into an array of values, including the equals sign, which must be removed:

D trans
RawWrangletrue
p03Value''
Typestep
WrangleTextreplace col:cartURL with: '' on: `=` global: true
p01NameColumn
p01ValuecartURL
p02NameFind
p02Value`=`
p03NameReplace with
p04Valuetrue
p04NameMatch all occurrences
SearchTermReplace text or pattern

You can now unnest these values into separate columns:

D trans
RawWrangletrue
Typestep
WrangleTextunnest col:extractlist_cartURL keys:'[0]','[1]'
p01NameColumn
p01Valueextractlist_cartURL
p02NamePaths to elements
p02Value'[0]','[1]'
SearchTermUnnest Objects into columns

After you rename the two columns to prodId and Qty, you can delete the column generated by the first transformation.

Results:

UsernamecartURLprodIdQty
joe.robinson http://example123.com/cart.asp?prodid=1001&qty=2 10012
steph.schmidt http://example123.com/cart.asp?prodid=1005&qty=4 10054
jack.holmes http://example123.com/cart.asp?prodid=2102&qty=1 21021
tina.jones http://example123.com/cart.asp?prodid=10412&qty=2 104122

Example - Extracting counts from a ragged array using extractlist

Include Page
EXAMPLE - Extractlist Transform
EXAMPLE - Extractlist Transform

D s also
labelwrangle_transform_extractlist