Extracts a set of values based on a specified pattern from a source column of any data type. The generated column contains an array of occurrences of the specified pattern. |
While the new column contains array data, the data type of the new column is sometimes inferred as String.
Your source column (myWidgetInventory
) is formatted in the following manner:
{ "red":"100","white":"1300","blue":"315","purple":"55"} |
The following step extracts the raw inventory contents of each color:
extractlist col: myWidgetInventory on:`{digit}+` |
Output: The generated column contains data that looks like the following array:
["100","1300","315","55"] |
extractlist: col:column_ref on:string_literal_pattern delimiter:string_literal_pattern [quote:'quoted_string'] [as:'new_column_name'] |
Token | Required? | Data Type | Description |
---|---|---|---|
extractlist | Y | transform | Name of the transform |
col | Y | string | Source column name |
on | Y | string | String literal or pattern that identifies the values to extract from the source column |
delimiter | Y | string | String literal or pattern that identifies the separator between the values to extract |
quote | N | string | Specifies a quoted object that is omitted from matching delimiters |
as | N | string | Name of the newly generated column |
Identifies the column to which to apply the transform. You can specify only one column.
Required? | Data Type |
---|---|
Yes | String (column name) |
Tip: You can insert the Unicode equivalent character for this parameter value using a regular expression of the form |
For the extractlist
tranform, all instances that match this pattern in the source column are extracted into the array list in the new column. Each occurrence in the generated array corresponds to an individual instance in the source; the new column can contain duplicate values.
To create array elements based only on the delimiter
parameter, set the following regular expression:
on:`/+/` |
Required? | Data Type |
---|---|
Yes | String (literal, regular expression, or |
For this transform, this parameter defines the pattern that separates the values that you want to extract into the array.
Tip: You can insert the Unicode equivalent character for this parameter value using a regular expression of the form |
Required? | Data Type |
---|---|
Yes | String (literal, regular expression, or |
extractlist col: MySourceValues on:`{alpha}+` delimiter:';' quote:'\"' |
Output: Extracts from the MySourceValues column each instance of a string value that occurs before the delimiter. Values between double-quotes are considered string literals and are not processed according to the delimiters defined in the transform.
Required? | Data Type |
---|---|
No | String |
The quote
value can appear anywhere in the column value. It is not limited by the constraints of any other parameters.
Required? | Data Type |
---|---|
No | String (column name) |
Source:
In this example, a list of URLs identifies the items in the shopping carts of visitors to your web site. You want to extract the shopping cart information embedded in the query parameters of the URL.
Username | cartURL |
---|---|
joe.robinson | http://example123.com/cart.asp?prodid=1001&qty=2 |
steph.schmidt | http://example123.com/cart.asp?prodid=1005&qty=4 |
jack.holmes | http://example123.com/cart.asp?prodid=2102&qty=1 |
tina.jones | http://example123.com/cart.asp?prodid=10412&qty=2 |
Transformation:
The following transformation extracts the list of query values from the URL. Note that the equals sign is included in the matching pattern so that you don't accidentally pick up numeric values from the non-parameter part of the URL:
The two query parameter values have been extracted into an array of values, including the equals sign, which must be removed:
You can now unnest these values into separate columns:
After you rename the two columns to prodId
and Qty
, you can delete the column generated by the first transformation.
Results:
Username | cartURL | prodId | Qty |
---|---|---|---|
joe.robinson | http://example123.com/cart.asp?prodid=1001&qty=2 | 1001 | 2 |
steph.schmidt | http://example123.com/cart.asp?prodid=1005&qty=4 | 1005 | 4 |
jack.holmes | http://example123.com/cart.asp?prodid=2102&qty=1 | 2102 | 1 |
tina.jones | http://example123.com/cart.asp?prodid=10412&qty=2 | 10412 | 2 |