Contents:
Tip: Use the extract
transform if you need to retain the source column. Otherwise, you might be able to use the split
transform. See Split Transform.
Basic Usage
extract col: text on: 'honda' limit: 10
Output: Extracts the value honda
from the source column text
up to 10 times and insert into a new column. The source column text
is unmodified.
Syntax and Parameters
extract col:column_ref [quote:'quoted_string'] [ignoreCase:true|false] [limit:max_count] [after:start_point | from: start_point] [before:end_point | to:end_point] [on:'exact_match'] [at:(start_index,end_index)]
NOTE: At least one of the following parameters must be included to specify the pattern to extract: after
, at
, before
, from
, on
, to
.
Token | Required? | Data Type | Description |
---|---|---|---|
extract | Y | transform | Name of the transform |
col | Y | string | Source column name |
quote | N | string | Specifies a quoted object that is omitted from pattern matching |
ignoreCase | N | boolean | If true , matching is case-insensitive. |
limit | N | integer (positive) | Identifies the number of extractions that can be performed from a single value. Default is 1. |
Matching parameters:
Parameter | Required? | Data Type | Description |
---|---|---|---|
after | N | string | String literal or pattern that precedes the pattern to match |
at | N | Array | Two-integer array identifying the character indexes of start and end characters to match |
before | N | string | String literal or pattern that appears after the pattern to match |
from | N | string | String literal or pattern that identifies the start of the pattern to match |
on | N | string | String literal or pattern that identifies the pattern to match. |
to | N | string | String literal or pattern that identifies the end of the pattern to match |
For more information on syntax standards, see Language Documentation Syntax Notes.
col
Identifies the column to which to apply the transform. You can specify only one column.
extract col: MyCol on: 'MyString'
Output: Extracts value My String
in new column if it is present in MyCol
. Otherwise, new column value is blank.
Usage Notes:
Required? | Data Type |
---|---|
Yes | String (column name) |
after
extract col: MyCol after: 'Important:'
Output: Extracts value in MyCol
that appears after the string Important:
. If the after
value does not appear in the column, the output value is blank.
after
parameter value using string literals, regular expressions, or Alteryx® patterns.
Usage Notes:
Required? | Data Type |
---|---|
No | String or pattern |
- The
after
andfrom
parameters are very similar.from
includes the matching value as part of the extracted string. after
can be used with eitherto
,on
, orbefore
. See Pattern Clause Position Matching.
at
extract col: MyCol at: 2,6
Output: Extracts contents of MyCol
that starts at the second character in the column and extends to the sixth character of the column.
Parameter inputs are in the form of x,y
, where x
and y
are positive integers indicating the starting character and ending character, respectively, of the pattern of interest.
x
must be less thany
.- If
y
is greater than the length of the value, the pattern is defined to the end of the value, and a match is made.
Usage Notes:
Required? | Data Type |
---|---|
No | Array of two Integers ( X,Y ) |
The at
parameter cannot be combined with any of the following: on
, after
, before
, from
, to
, and quote
. See Pattern Clause Position Matching.
before
extract col: MyCol before: '|'
Output: Extracts contents of MyCol
that occur before the pipe character (|
). If the before
value does not appear in the column, the output value is blank.
Usage Notes:
Required? | Data Type |
---|---|
No | String or pattern |
- The
before
andto
parameters are very similar.to
includes the matching value as part of the extracted string. before
can be used with eitherfrom
,on
, orafter
. See Pattern Clause Position Matching .
from
extract col: MyCol from: 'go:'
Output: Extracts contents from MyCol
that occur after go:
, including go:
. If the from
value does not appear in the column, the output value is blank.
Identifies the pattern that marks the beginning of the value to match. Pattern can be a string literal, Alteryx® pattern, or regular expression. The from
value is included in the match.
Usage Notes:
Required? | Data Type |
---|---|
No | String or pattern |
- The
after
andfrom
parameters are very similar.from
includes the matching value as part of the extracted string. from
can be used with eitherto
orbefore
. See Pattern Clause Position Matching .
on
extract col: MyCol on: `###ERROR`
Identifies the pattern to match. Pattern can be a string literal, Alteryx® pattern, or regular expression pattern.
Tip: You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/
. For example, /\u0013/
represents Unicode character 0013
(carriage return). For more information, see Supported Special Regular Expression Characters.
Usage Notes:
Required? | Data Type |
---|---|
No | String (literal, regular expression, or Alteryx pattern ) |
to
extract col:MyCol from:'note:' to: `{end}`
Output: Extracts from MyCol
column all values that begin with note:
up to the end of the value.
Identifies the pattern that marks the ending of the value to match. Pattern can be a string literal, Alteryx® pattern, or regular expression. The to
value is included in the match.
Usage Notes:
Required? | Data Type |
---|---|
No | String or pattern |
- The
before
andto
parameters are very similar.to
includes the matching value as part of the extracted string. to
can be used with eitherfrom
orafter
. See Pattern Clause Position Matching.
quote
extract col: MyCol quote: 'First' after: `{start}%?`
Output: Extracts each cell value from the MyCol
column, starting at the second character in the cell, as long as the string First
does not appear in the cell.
Can be used to specify a string as a single quoted object. This parameter value can be one or more characters.
Usage Notes:
Required? | Data Type |
---|---|
No | String |
- Parameter value is the quoted object.
- The
quote
value can appear anywhere in the column value. It is not limited by the constraints of any other parameters.
ignoreCase
extract col: MyCol on: 'My String' ignoreCase: true
Output: Extracts the following values if they appear in the MyCol
column: My String
, my string
, My string
, etc.
- Set to
true
to ignore case matching. - (Default) Set to
false
to perform case-sensitive matching.
Usage Notes:
Required? | Data Type |
---|---|
No | Boolean |
limit
extract col: MyCol on: 'z' limit: 3
Output: Extracts each instance of the letter z
in the MyCol
column into a separate column, generating up to 3 new columns.
limit
parameter defines the maximum number of times that a pattern can be matched within a column.NOTE: The limit
parameter cannot be used with the following parameters: at
, positions
, or delimiters
.
A set of new columns is generated, as defined by the limit
parameter. Each matched instance populates a separate column, until there are no more matches or all of the limit
-generated new columns are filled.
Usage Notes:
Required? | Data Type |
---|---|
No | Integer (positive) |
- Defines the maximum number of columns that can be created by the
extract
transform. - If not specified, exactly one column is created.
Tip: For additional examples, see Common Tasks.
Examples
Example - Extract First Name
Source:
Name |
---|
Mr. Mike Smith |
Dr Jane Jones |
Miss Meg Moon |
Transform:
The following transform extracts the second word in the above dataset. Content is extracted after the first space and before the next space.
Tip: If you want to break out salutation, first name, and last name at the same time, you should use the split
transform instead. See Split Transform.
extract col:Name after: ' ' before: ' '
Results:
Name | Name2 |
---|---|
Mr. Mike Smith | Mike |
Dr Jane Jones | Jane |
Miss Meg Moon | Meg |
Example - Extract Log Levels
Source:
The following represents raw log messages extracted from an application. You want to extract the error level for each message: INFO
, WARNING
, or ERROR
.
app_log |
---|
20115-10-30T15:43:37:874Z INFO Client env:started |
20115-10-30T15:43:38:009Z INFO Client env:launched Chromium component |
20115-10-30T15:43:38:512Z ERROR Client env:failed to connect to local DB |
20115-10-30T15:43:38:515Z INFO Client env:launched application |
Transform:
In the above transform, the text of interest appears after the timestamp and before the message.
- In the
after
clause, a pattern is required. In this case, the selection rule identifies the last segment of the timestamp, with the three pound signs (#
) identifying three digits of unknown value. The "Z " value gives the selection rule an extra bit of specificity. Note the backticks to denote the selection rule. - In the
before
clause, you can use a simple space character string, since it is consistent across all of the data.
extract col:app_log after: `###Z ` before: ' '
Results:
app_log | app_log_2 |
---|---|
20115-10-30T15:43:37:874Z INFO Client env:started | INFO |
20115-10-30T15:43:38:009Z INFO Client env:launched Chromium component | INFO |
20115-10-30T15:43:38:512Z ERROR Client env:failed to connect to local DB | ERROR |
20115-10-30T15:43:38:515Z INFO Client env:launched application | INFO |
Example - Clean up marketing contact data with replace, set, and extract
set
- defines the values to use in a predefined column. See Set Transform.Tip: Use the
derive
transform to generate a new column containing a defined set of values. See Derive Transform.replace
- replaces a string literal or pattern appearing in the values of a column with a specific string. See Replace Transform.extract
- extracts a pattern-based value from a column and stores it in a new column. See Extract Transform.
Source:
The following dataset contains contact information that has been gathered by your marketing platform from actions taken by visitors on your website. You must clean up this data and prepare it for use in an analytics platform.
LeadId | LastName | FirstName | Title | Phone | Request |
---|---|---|---|---|---|
LE160301001 | Jones | Charles | Chief Technical Officer | 415-555-1212 | reg |
LE160301002 | Lyons | Edward | 415-012-3456 | download whitepaper | |
LE160301003 | Martin | Mary | CEO | 510-555-5555 | delete account |
LE160301004 | Smith | Talia | Engineer | 510-123-4567 | free trial |
Transform:
Title column: For example, you first notice that some data is missing. Your analytics platform recognizes the string value, "#MISSING#
" as an indicator of a missing value. So, you click the missing values bar in the Title column. Then, you select the Replace suggestion card. Note that the default replacement is a null value, so you click Modify and update it:
set col: Title value: IF(ISMISSING([Title]),'#MISSING#',Title)
reg
entry should be cleaned up. Add the following transform, which replaces that value:
replace col:Request with:'Registration' on:`{start}reg{end}`
on:
parameter. This expression indicates to match from the start of the cell value, the string literal reg
, and then the end of the cell value, which matches on complete cell values of reg
only.This transform works great on the sample, but what happens if the value is Reg
with a capital R
? That value might not be replaced. To improve the transform, you can modify the transform with the following Alteryx pattern in the on
parameter, which captures differences in capitalization:
replace col:Request with:'Registration' on:`{start}{[R|r]}eg{end}`
Request
column should be capitalized in title or proper case:
set col:Request value:PROPER(Request)
Phone column: You might have noticed some issues with the values in the Phone
column. In the United States, the prefix 555
is only used for gathering information; these are invalid phone numbers.
In the data grid, you select the first instance of 555
in the column. However, it selects all instances of that pattern, including ones that you don't want to modify. In this case, continue your selection by selecting the similar instance of 555
in the other row. In the suggestion cards, you click the Replace transform.
Notice, however, that the default Replace transform has also highlighted the second 555
pattern in one instance, which could be a problem in other phone numbers not displayed in the sample. You must modify the selection pattern for this transform. In the on:
parameter below, the Alteryx pattern has been modified to match only the instances of 555
that appear in the second segment in the phone number format:
replace col: Phone on: `{start}%{3}-555-%*{end}` with: '#INVALID#' global: true
%*
). While it might be possible to add a pattern that matches on the last four characters exactly (%{4}
), that matching pattern would not capture the possibility of a phone number having an extension at the end of it. The above expression does.NOTE: The above transform creates values that are mismatched with the Phone Number data type. In this example, however, these mismatches are understood to be for the benefit of the system consuming your Alteryx output.
LeadId column: You might have noticed that the lead identifier column (LeadId
) contains some embedded information: a date value and an identifier for the instance within the day. The following steps can be used to break out this information. The first one creates a separate working column with this information, which allows us to preserve the original, unmodified column:
derive type:single value:LeadId as:'LeadIdworking'
replace col:LeadIdworking with:'' on:'LE'
extract col: LeadIdworking on: `{start}%{6}`
LeadIdworking2
column now contains only the date information. Cleaning up this column requires reformatting the data, retyping it as a Datetime type, and then applying the dateformat
function to format it to your satisfaction. These steps are left as a separate exercise.For now, let's just rename the column:
rename col:LeadIdworking1 to:'LeadIdDate'
replace col: LeadIdworking on: `{start}%{6}` with: ''
rename col:LeadIdworking to:'LeadIdDaily'
Results:
LeadId | LeadIdDaily | LeadIdDate | LastName | FirstName | Title | Phone | Request |
---|---|---|---|---|---|---|---|
LE160301001 | 001 | 160301 | Jones | Charles | Chief Technical Officer | #INVALID# | Registration |
LE160301002 | 002 | 160301 | Lyons | Edward | #MISSING# | 415-012-3456 | Download Whitepaper |
LE160301003 | 003 | 160301 | Martin | Mary | CEO | #INVALID# | Delete Account |
LE160301004 | 004 | 160301 | Smith | Talia | Engineer | 510-123-4567 | Free Trial |
This page has no comments.