D toc |
---|
Extracting one or more values from within a column of values can turn data into meaningful and discrete information. This section describes how to extract column data, the methods for which may vary depending on the data type.
Extract vs. Split
Extract and split transformations do not do the same thing:
...
Tip |
---|
Tip: If you set the number of patterns to extract to |
Extract components of a URL
URL components
Using functions, you can extract specific elements of a valid URL. The following transformation pulls the domain values from the myURL
column:
D trans | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
In some cases, the function may not return values. For example, the SUBDOMAIN function returns empty values if there is no sub-domain part of the URL.
The following functions can be used to extract values from a set of URLs:
Query parameters
You can extract query parameter values from an URL. The following example extracts the store_id
value from the storeURL
field value:
D trans | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Extract object values
If your data includes sets of arrays, you can extract array elements into columns for each key, with the values written to each key column.
...
D trans | ||||||||
---|---|---|---|---|---|---|---|---|
|
Extract
...
URL components
...
Values into a List
You can also extract sets of values into an array list of values.
Tip |
---|
Tip: This transformation is useful for extracting types or patterns of information from a single column. |
Extract matches into array
Using
D s item | ||
---|---|---|
|
myURL
columnthe values of the column to form a new column of arrays. The following example shows the usage of {any} pattern to extract the cell values and form a new array column.Transformation:
You can extract query parameter values from an URL. The following example extracts the store_id
value from the storeURL
field value:
D trans | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
In some cases, the function may not return values. For example, the SUBDOMAIN function returns empty values if there is no sub-domain part of the URL.
The following functions can be used to extract values from a set of URLs:
Query parameters
|
Results:
Before | After |
---|---|
socks, socks, socks | ["socks", "socks", "socks"] |
pants, pants | ["pants", "pants"] |
Excerpt | |||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Extract hashtagsSuppose you need to extract the hashtags from customer tweets to another column. In such cases, you can use the
Source: The following dataset contains a customer tweets across different locations.
Transformation: The following transformation extracts the hashtag messages from customer tweets.
|
...
|
...
|
...
|
...
|
...
|
...
|
...
...
Results:
|