...
Tip |
---|
Tip: If you set the number of patterns to extract to 2 for the address column, you might extract apartment or suite information. |
Using functions, you can extract specific elements of a valid URL. The following transformation pulls the domain values from the myURL
column:
D trans |
---|
p03Value | myDomain |
---|
Type | step |
---|
p01Name | Formula type |
---|
p01Value | Single row formula |
---|
p02Name | Formula |
---|
p02Value | DOMAIN(myURL) |
---|
p03Name | New column name |
---|
SearchTerm | New formula |
---|
|
In some cases, the function may not return values. For example, the SUBDOMAIN function returns empty values if there is no sub-domain part of the URL.
The following functions can be used to extract values from a set of URLs:
You can extract query parameter values from an URL. The following example extracts the store_id
value from the storeURL
field value:
D trans |
---|
p03Value | store_id |
---|
Type | step |
---|
p01Name | Column to extract from |
---|
p01Value | storeURL |
---|
p02Name | Option |
---|
p02Value | HTTP Query strings |
---|
p03Name | Fields to extract |
---|
SearchTerm | Extract patterns |
---|
|
If your data includes sets of arrays, you can extract array elements into columns for each key, with the values written to each key column.
...
D trans |
---|
Type | step |
---|
p01Name | Column |
---|
p01Value | Events |
---|
SearchTerm | Expand arrays into rows |
---|
|
...
...
You can also extract sets of values into an array list of values.
Tip |
---|
Tip: This transformation is useful for extracting types or patterns of information from a single column. |
Using
, you can extract
specific elements of a valid URL. The following transformation pulls the domain values from the myURL
columnthe values of the column to form a new column of arrays. The following example shows the usage of {any} pattern to extract the cell values and form a new array column.Transformation:
You can extract query parameter values from an URL. The following example extracts the store_id
value from the storeURL
field value:
D trans |
---|
p03Value | myDomain`,` |
---|
Type | step |
---|
p01Name | Formula typeColumn |
---|
p01Value | Single row formulaproduct |
---|
p02Name | FormulaPattern matching elements in the list |
---|
p02Value | DOMAIN(myURL)`{any}` |
---|
p06Value | 1 |
---|
p03Name | New column name |
---|
SearchTerm | New formula |
---|
|
In some cases, the function may not return values. For example, the SUBDOMAIN function returns empty values if there is no sub-domain part of the URL.
The following functions can be used to extract values from a set of URLs:
Delimiter separating each element | SearchTerm | Extract matches into Array |
---|
|
Results:
Before | After |
---|
socks, socks, socks | ["socks", "socks", "socks"] |
pants, pants | ["pants", "pants"] |
Excerpt |
---|
Suppose you need to extract the hashtags from customer tweets to another column. In such cases, you can use the {hashtag} to extract all hashtag values from a customer's tweets into a new column.Source: The following dataset contains a customer tweets across different locations. User Name | Location | Customer tweets |
---|
James | U.K | Excited to announce that we’ve transitioned Wrangler from a hybrid desktop application to a completely cloud-based service! #dataprep #businessintelligence #CommitToCleanData # London | Mark | Berlin | Learnt more about the importance of identifying issues in your data—early and often #CommitToCleanData #predictivetransformations #realbusinessintelligence | Catherine | Paris | Clean data is the foundation of your analysis. Learn more about what we consider the five tenets of sound #dataprep, starting with #1a prioritizing and setting targets. #startwiththeuser #realbusinessintelligence #Paris | Dave | New York | Learn how #NewYorklife onboarded as part of their #bigdata #dataprep initiative to unlock hidden insights and make them accessible across departments. | Christy | San Francisco | How can you quickly determine the number of times a user ID appears in your data?#dataprep #pivot #aggregation#machinelearning initiatives #SFO |
Transformation: The following transformation extracts the hashtag messages from customer tweets. |
...
Hashtag tweets | Type | step |
---|
p01Name | Column |
---|
|
|
...
...
...
Pattern matching elements in the list | p02Value |
---|
|
|
...
...
New column name | SearchTerm | Extract |
---|
|
|
...
Results: User Name | Location | Hashtag tweets |
---|
James | U.K | ["#dataprep", "#businessintelligence", "#CommitToCleanData", " # London"] | Mark | Berlin | ["#CommitToCleanData", "#predictivetransformations", "#realbusinessintelligence", "0"] | Catherine | Paris | ["#dataprep", "#startwiththeuser","#realbusinessintelligence", "# Paris"] | Dave | New York | ["#NewYorklife", "dataprep", "bigdata", "0"] | Christy | SanFrancisco | [ "dataprep", "#pivot", "#aggregation", "#machinelearning"] |
|