Extracts a set of values based on a specified pattern from a source column of any data type. The generated column contains an array of occurrences of the specified pattern.

While the new column contains array data, the data type of the new column is sometimes inferred as String. 

Your source column (myWidgetInventory) is formatted in the following manner:

{ "red":"100","white":"1300","blue":"315","purple":"55"}

The following step extracts the raw inventory contents of each color:

extractlist col: myWidgetInventory on:`{digit}+`

Output: The generated column contains data that looks like the following array:

["100","1300","315","55"]

extractlist: col:column_ref on:string_literal_pattern delimiter:string_literal_pattern [quote:'quoted_string'] [as:'new_column_name']

TokenRequired?Data TypeDescription
extractlistYtransformName of the transform
colYstringSource column name
onYstringString literal or pattern that identifies the values to extract from the source column
delimiterYstringString literal or pattern that identifies the separator between the values to extract
quoteNstringSpecifies a quoted object that is omitted from matching delimiters
asNstringName of the newly generated column

col

Identifies the column to which to apply the transform. You can specify only one column.

Required?Data Type
YesString (column name)

on

Tip: You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/. For example, /\u0013/ represents Unicode character 0013 (carriage return). For more information, see Supported Special Regular Expression Characters.

For the extractlist tranform, all instances that match this pattern in the source column are extracted into the array list in the new column. Each occurrence in the generated array corresponds to an individual instance in the source; the new column can contain duplicate values.

To create array elements based only on the delimiter parameter, set the following regular expression:

on:`/+/`

Required?Data Type
Yes

String (literal, regular expression, or )

delimiter

For this transform, this parameter defines the pattern that separates the values that you want to extract into the array.

Tip: You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/. For example, /\u0013/ represents Unicode character 0013 (carriage return). For more information, see Supported Special Regular Expression Characters.

Required?Data Type
Yes

String (literal, regular expression, or )

quote

extractlist col: MySourceValues on:`{alpha}+` delimiter:';' quote:'\"'

Output: Extracts from the MySourceValues column each instance of a string value that occurs before the delimiter. Values between double-quotes are considered string literals and are not processed according to the delimiters defined in the transform. 

Required?Data Type
NoString

as

Required?Data Type
NoString (column name)


Extract hashtag values

Extract query parameters from URLs

Source:

In this example, a list of URLs identifies the items in the shopping carts of visitors to your web site. You want to extract the shopping cart information embedded in the query parameters of the URL.

UsernamecartURL
joe.robinsonhttp://example123.com/cart.asp?prodid=1001&qty=2
steph.schmidthttp://example123.com/cart.asp?prodid=1005&qty=4
jack.holmeshttp://example123.com/cart.asp?prodid=2102&qty=1
tina.joneshttp://example123.com/cart.asp?prodid=10412&qty=2

Transformation:

The following transformation extracts the list of query values from the URL. Note that the equals sign is included in the matching pattern so that you don't accidentally pick up numeric values from the non-parameter part of the URL:

The two query parameter values have been extracted into an array of values, including the equals sign, which must be removed:

You can now unnest these values into separate columns:

After you rename the two columns to prodId and Qty, you can delete the column generated by the first transformation.

Results:

UsernamecartURLprodIdQty
joe.robinson http://example123.com/cart.asp?prodid=1001&qty=2 10012
steph.schmidt http://example123.com/cart.asp?prodid=1005&qty=4 10054
jack.holmes http://example123.com/cart.asp?prodid=2102&qty=1 21021
tina.jones http://example123.com/cart.asp?prodid=10412&qty=2 104122

Extract counts from a ragged array using extractlist