Extracts a subset of data from one column and inserts it into a new column, based on a specified string or pattern. The source column in unmodified. 

Tip: Use the extract transform if you need to retain the source column. Otherwise, you might be able to use the split transform. See Split Transform.

extract col: text on: 'honda' limit: 10

Output: Extracts the value honda from the source column text up to 10 times and insert into a new column. The source column text is unmodified.

extract col:column_ref [quote:'quoted_string'] [ignoreCase:true|false] [limit:max_count] [after:start_point | from: start_point] [before:end_point | to:end_point] [on:'exact_match'] [at:(start_index,end_index)]

NOTE: At least one of the following parameters must be included to specify the pattern to extract: after, at, before, from, on, to.

TokenRequired?Data TypeDescription
extractYtransformName of the transform
colYstringSource column name
quoteNstringSpecifies a quoted object that is omitted from pattern matching
ignoreCaseNbooleanIf true, matching is case-insensitive.
limitNinteger (positive)Identifies the number of extractions that can be performed from a single value. Default is 1.

Matching parameters:

ParameterRequired?Data TypeDescription
afterNstringString literal or pattern that precedes the pattern to match
atNArrayTwo-integer array identifying the character indexes of start and end characters to match
beforeNstringString literal or pattern that appears after the pattern to match
fromNstringString literal or pattern that identifies the start of the pattern to match
onNstringString literal or pattern that identifies the pattern to match.
toNstringString literal or pattern that identifies the end of the pattern to match

col

Identifies the column to which to apply the transform. You can specify only one column.

extract col: MyCol on: 'MyString'

Output: Extracts value My String in new column if it is present in MyCol. Otherwise, new column value is blank.

Required?Data Type
YesString (column name)

after

extract col: MyCol after: 'Important:'

Output: Extracts value in MyCol that appears after the string Important:. If the after value does not appear in the column, the output value is blank.

Required?Data Type
NoString or pattern

at

extract col: MyCol at: 2,6

Output: Extracts contents of MyCol that starts at the second character in the column and extends to the sixth character of the column.

Required?Data Type
NoArray of two Integers ( X,Y )

The at parameter cannot be combined with any of the following: onafterbeforefromto, and quote. See  Pattern Clause Position Matching.

before

extract col: MyCol before: '|'

Output: Extracts contents of MyCol that occur before the pipe character (|). If the before value does not appear in the column, the output value is blank.

Required?Data Type
NoString or pattern

from

extract col: MyCol from: 'go:'

Output: Extracts contents from MyCol that occur after go:, including go:. If the from value does not appear in the column, the output value is blank.

Required?Data Type
NoString or pattern

on

extract col: MyCol on: `###ERROR`

Tip: You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/. For example, /\u0013/ represents Unicode character 0013 (carriage return). For more information, see Supported Special Regular Expression Characters.

Required?Data Type
No

String (literal, regular expression, or )

to

extract col:MyCol from:'note:' to: `{end}`

Output: Extracts from MyCol column all values that begin with note: up to the end of the value. 

Required?Data Type
NoString or pattern

quote

extract col: MyCol quote: 'First' after: `{start}%?`

Output: Extracts each cell value from the MyCol column, starting at the second character in the cell, as long as the string First does not appear in the cell.

Required?Data Type
NoString

ignoreCase

extract col: MyCol on: 'My String' ignoreCase: true

Output: Extracts the following values if they appear in the MyCol column: My Stringmy stringMy string, etc.

Required?Data Type
NoBoolean

limit

extract col: MyCol on: 'z' limit: 3

Output: Extracts each instance of the letter z in the MyCol column into a separate column, generating up to 3 new columns.

Required?Data Type
NoInteger (positive)

 

Example - Extract First Name

Source:

Name
Mr. Mike Smith
Dr Jane Jones
Miss Meg Moon

Transformation:

The following transformation extracts the second word in the above dataset. Content is extracted after the first space and before the next space.

Tip: If you want to break out salutation, first name, and last name at the same time, you should use the Split Column transformation instead.

Results:

NameName2
Mr. Mike SmithMike
Dr Jane JonesJane
Miss Meg MoonMeg

Example - Extract Log Levels

Source:

The following represents raw log messages extracted from an application. You want to extract the error level for each message: INFOWARNING, or ERROR.

app_log
20115-10-30T15:43:37:874Z INFO Client env:started
20115-10-30T15:43:38:009Z INFO Client env:launched Chromium component
20115-10-30T15:43:38:512Z ERROR Client env:failed to connect to local DB
20115-10-30T15:43:38:515Z INFO Client env:launched application

Transformation:

The text of interest appears after the timestamp and before the message.

Results:

app_logapp_log_2
20115-10-30T15:43:37:874Z INFO Client env:startedINFO
20115-10-30T15:43:38:009Z INFO Client env:launched Chromium componentINFO
20115-10-30T15:43:38:512Z ERROR Client env:failed to connect to local DBERROR
20115-10-30T15:43:38:515Z INFO Client env:launched applicationINFO

Example - Clean up marketing contact data with replace, set, and extract