Extracting one or more values from within a column of values can turn data into meaningful and discrete information. This section describes how to extract column data, the methods for which may vary depending on the data type.
Extract and split transformations do not do the same thing:
...
Tip |
---|
Tip: If you set the number of patterns to extract to 2 for the address column, you might extract apartment or suite information. |
Using functions, you can extract specific elements of a valid URL. The following transformation pulls the domain values from the myURL
column:
D trans |
---|
p03Value | myDomain |
---|
Type | step |
---|
p01Name | Formula type |
---|
p01Value | Single row formula |
---|
p02Name | Formula |
---|
p02Value | DOMAIN(myURL) |
---|
p03Name | New column name |
---|
SearchTerm | New formula |
---|
|
In some cases, the function may not return values. For example, the SUBDOMAIN function returns empty values if there is no sub-domain part of the URL.
The following functions can be used to extract values from a set of URLs:
You can extract query parameter values from an URL. The following example extracts the store_id
value from the storeURL
field value:
D trans |
---|
p03Value | store_id |
---|
Type | step |
---|
p01Name | Column to extract from |
---|
p01Value | storeURL |
---|
p02Name | Option |
---|
p02Value | HTTP Query strings |
---|
p03Name | Fields to extract |
---|
SearchTerm | Extract patterns |
---|
|
If your data includes sets of arrays, you can extract array elements into columns for each key, with the values written to each key column.
...
D trans |
---|
Type | step |
---|
p01Name | Column |
---|
p01Value | Events |
---|
SearchTerm | Expand arrays into rows |
---|
|
...
...
You can also extract sets of values into an array list of values.
Tip |
---|
Tip: This transformation is useful for extracting types or patterns of information from a single column. |
Using
, you can extract
specific elements of a valid URL. The following transformation pulls the domain values from the myURL
columnthe values of the column to form a new column of arrays. The following example shows the usage of {any} pattern to extract the cell values and form a new array column.Transformation:
You can extract query parameter values from an URL. The following example extracts the store_id
value from the storeURL
field value:
D trans |
---|
p03Value | myDomain`,` |
---|
Type | step |
---|
p01Name | Formula typeColumn |
---|
p01Value | Single row formulaproduct |
---|
p02Name | FormulaPattern matching elements in the list |
---|
p02Value | DOMAIN(myURL)`{any}` |
---|
p06Value | 1 |
---|
p03Name | New column name |
---|
SearchTerm | New formula |
---|
|
In some cases, the function may not return values. For example, the SUBDOMAIN function returns empty values if there is no sub-domain part of the URL.
The following functions can be used to extract values from a set of URLs:
Delimiter separating each element | SearchTerm | Extract matches into Array |
---|
|
Results:
Before | After |
---|
socks, socks, socks | ["socks", "socks", "socks"] |
pants, pants | ["pants", "pants"] |
Excerpt |
---|
Suppose you need to extract the hashtags from customer tweets to another column. In such cases, you can use the {hashtag} to extract all hashtag values from a customer's tweets into a new column.Source: The following dataset contains a customer tweets across different locations. User Name | Location | Customer tweets |
---|
James | U.K | Excited to announce that we’ve transitioned Wrangler from a hybrid desktop application to a completely cloud-based service! #dataprep #businessintelligence #CommitToCleanData # London | Mark | Berlin | Learnt more about the importance of identifying issues in your data—early and often #CommitToCleanData #predictivetransformations #realbusinessintelligence | Catherine | Paris | Clean data is the foundation of your analysis. Learn more about what we consider the five tenets of sound #dataprep, starting with #1a prioritizing and setting targets. #startwiththeuser #realbusinessintelligence #Paris | Dave | New York | Learn how #NewYorklife onboarded as part of their #bigdata #dataprep initiative to unlock hidden insights and make them accessible across departments. | Christy | San Francisco | How can you quickly determine the number of times a user ID appears in your data?#dataprep #pivot #aggregation#machinelearning initiatives #SFO |
Transformation: The following transformation extracts the hashtag messages from customer tweets. |
...
Hashtag tweets | Type | step |
---|
p01Name | Column |
---|
|
|
...
...
...
Pattern matching elements in the list | p02Value |
---|
|
|
...
...
New column name | SearchTerm | Extract |
---|
|
|
...
Results: User Name | Location | Hashtag tweets |
---|
James | U.K | ["#dataprep", "#businessintelligence", "#CommitToCleanData", " # London"] | Mark | Berlin | ["#CommitToCleanData", "#predictivetransformations", "#realbusinessintelligence", "0"] | Catherine | Paris | ["#dataprep", "#startwiththeuser","#realbusinessintelligence", "# Paris"] | Dave | New York | ["#NewYorklife", "dataprep", "bigdata", "0"] | Christy | SanFrancisco | [ "dataprep", "#pivot", "#aggregation", "#machinelearning"] |
|