Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r097

D toc

D s transforms

Excerpt

Extracts a subset of data from one column and inserts it into a new column, based on a specified string or pattern. The source column in unmodified. 

Tip

Tip: Use the extract transform if you need to retain the source column. Otherwise, you might be able to use the split transform. See Split Transform.

D s
snippetBasic

D code
extract col: text on: 'honda' limit: 10

Output: Extracts the value honda from the source column text up to 10 times and insert into a new column. The source column text is unmodified.

D s
snippetParameters

D code

extract col:column_ref [quote:'quoted_string'] [ignoreCase:true|false] [limit:max_count] [after:start_point | from: start_point] [before:end_point | to:end_point] [on:'exact_match'] [at:(start_index,end_index)]

Info

NOTE: At least one of the following parameters must be included to specify the pattern to extract: after, at, before, from, on, to.

TokenRequired?Data TypeDescription
extractYtransformName of the transform
colYstringSource column name
quoteNstringSpecifies a quoted object that is omitted from pattern matching
ignoreCaseNbooleanIf true, matching is case-insensitive.
limitNinteger (positive)Identifies the number of extractions that can be performed from a single value. Default is 1.

Matching parameters:

ParameterRequired?Data TypeDescription
afterNstringString literal or pattern that precedes the pattern to match
atNArrayTwo-integer array identifying the character indexes of start and end characters to match
beforeNstringString literal or pattern that appears after the pattern to match
fromNstringString literal or pattern that identifies the start of the pattern to match
onNstringString literal or pattern that identifies the pattern to match.
toNstringString literal or pattern that identifies the end of the pattern to match

D s lang notes

col

Identifies the column to which to apply the transform. You can specify only one column.

D code

extract col: MyCol on: 'MyString'

Output: Extracts value My String in new column if it is present in MyCol. Otherwise, new column value is blank.

D s
snippetusage

Required?Data Type
YesString (column name)

after

D code

extract col: MyCol after: 'Important:'

Output: Extracts value in MyCol that appears after the string Important:. If the after value does not appear in the column, the output value is blank.

Include Page
after Parameter
after Parameter

D s
snippetusage

Required?Data Type
NoString or pattern
  • The after and from parameters are very similar. from includes the matching value as part of the extracted string.
  • after can be used with either toon, or before. See  Pattern Clause Position Matching.

at

D code

extract col: MyCol at: 2,6

Output: Extracts contents of MyCol that starts at the second character in the column and extends to the sixth character of the column.

Include Page
at Parameter
at Parameter

D s
snippetusage

Required?Data Type
NoArray of two Integers ( X,Y )

The at parameter cannot be combined with any of the following: onafterbeforefromto, and quote. See  Pattern Clause Position Matching.

before

D code

extract col: MyCol before: '|'

Output: Extracts contents of MyCol that occur before the pipe character (|). If the before value does not appear in the column, the output value is blank.

Include Page
before Parameter
before Parameter

D s
snippetusage

Required?Data Type
NoString or pattern
  • The before and to parameters are very similar.  to includes the matching value as part of the extracted string.
  • before can be used with either fromon, or afterSee  Pattern Clause Position Matching .

from

D code

extract col: MyCol from: 'go:'

Output: Extracts contents from MyCol that occur after go:, including go:. If the from value does not appear in the column, the output value is blank.

Include Page
from Parameter
from Parameter

D s
snippetusage

Required?Data Type
NoString or pattern
  • The after and from parameters are very similar.  from includes the matching value as part of the extracted string.
  • from can be used with either to or beforeSee  Pattern Clause Position Matching .

on

D code

extract col: MyCol on: `###ERROR`

Include Page
on Parameter
on Parameter

Tip

Tip: You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/. For example, /\u0013/ represents Unicode character 0013 (carriage return). For more information, see Supported Special Regular Expression Characters.

D s
snippetusage

Required?Data Type
No

String (literal, regular expression, or

D s item
itempattern
)

to

D code

extract col:MyCol from:'note:' to: `{end}`

Output: Extracts from MyCol column all values that begin with note: up to the end of the value. 

Include Page
to Parameter
to Parameter

D s
snippetusage

Required?Data Type
NoString or pattern
  • The before and to parameters are very similar.  to includes the matching value as part of the extracted string.
  • to can be used with either from or after. See Pattern Clause Position Matching.

quote

D code

extract col: MyCol quote: 'First' after: `{start}%?`

Output: Extracts each cell value from the MyCol column, starting at the second character in the cell, as long as the string First does not appear in the cell.

Include Page
quote Parameter
quote Parameter

D s
snippetusage

Required?Data Type
NoString
  • Parameter value is the quoted object.
  • The quote value can appear anywhere in the column value. It is not limited by the constraints of any other parameters.

ignoreCase

D code

extract col: MyCol on: 'My String' ignoreCase: true

Output: Extracts the following values if they appear in the MyCol column: My Stringmy stringMy string, etc.

Include Page
ignoreCase Parameter
ignoreCase Parameter

D s
snippetusage

Required?Data Type
NoBoolean

limit

D code

extract col: MyCol on: 'z' limit: 3

Output: Extracts each instance of the letter z in the MyCol column into a separate column, generating up to 3 new columns.

Include Page
limit Parameter
limit Parameter

D s
snippetusage

Required?Data Type
NoInteger (positive)
  • Defines the maximum number of columns that can be created by the extract transform. 
  • If not specified, exactly one column is created. 

 

D s
snippetExamples

Example - Extract First Name

Source:

Name
Mr. Mike Smith
Dr Jane Jones
Miss Meg Moon

Transformation:

The following transformation extracts the second word in the above dataset. Content is extracted after the first space and before the next space.

Tip

Tip: If you want to break out salutation, first name, and last name at the same time, you should use the Split Column transformation instead.

D trans
RawWrangletrue
p03Value' '
Typestep
WrangleTextextract col:Name after: ' ' before: ' '
p01NameColumn to extract from
p01ValueName
p02NameOption
p02ValueCustom text or pattern
p03NameStart extracting after
p04Value' '
p04NameEnd extracting before
SearchTermExtract text or pattern

Results:

NameName2
Mr. Mike SmithMike
Dr Jane JonesJane
Miss Meg MoonMeg

Example - Extract Log Levels

Source:

The following represents raw log messages extracted from an application. You want to extract the error level for each message: INFOWARNING, or ERROR.

app_log
20115-10-30T15:43:37:874Z INFO Client env:started
20115-10-30T15:43:38:009Z INFO Client env:launched Chromium component
20115-10-30T15:43:38:512Z ERROR Client env:failed to connect to local DB
20115-10-30T15:43:38:515Z INFO Client env:launched application

Transformation:

The text of interest appears after the timestamp and before the message.

  • In the after clause, a pattern is required. In this case, the selection rule identifies the last segment of the timestamp, with the three pound signs (#) identifying three digits of unknown value. The "Z " value gives the selection rule an extra bit of specificity. Note the backticks to denote the selection rule. 
  • In the before clause, you can use a simple space character string, since it is consistent across all of the data.

D trans
RawWrangletrue
p03Value`###Z `
Typestep
WrangleTextextract col:app_log after: `###Z ` before: ' '
p01NameColumn to extract from
p01Valueapp_log
p02NameOption
p02ValueCustom text or pattern
p03NameStart extracting after
p04Value' '
p04NameEnd extracting before
SearchTermExtract text or pattern

Results:

app_logapp_log_2
20115-10-30T15:43:37:874Z INFO Client env:startedINFO
20115-10-30T15:43:38:009Z INFO Client env:launched Chromium componentINFO
20115-10-30T15:43:38:512Z ERROR Client env:failed to connect to local DBERROR
20115-10-30T15:43:38:515Z INFO Client env:launched applicationINFO

Example - Clean up marketing contact data with replace, set, and extract

Include Page
EXAMPLE - Replacement Transforms
EXAMPLE - Replacement Transforms

D s also
labelwrangle_transform_extract