Excerpt |
---|
Extracts a subset of data from one column and inserts it into a new column, based on a specified string or pattern. The source column in unmodified. |
Tip |
---|
Tip: Use the extract transform if you need to retain the source column. Otherwise, you might be able to use the split transform. See Split Transform. |
D code |
---|
extract col: text on: 'honda' limit: 10 |
Output: Extracts the value honda
from the source column text
up to 10 times and insert into a new column. The source column text
is unmodified.
D code |
---|
extract col:column_ref [quote:'quoted_string'] [ignoreCase:true|false] [limit:max_count] [after:start_point | from: start_point] [before:end_point | to:end_point] [on:'exact_match'] [at:(start_index,end_index)] |
Info |
---|
NOTE: At least one of the following parameters must be included to specify the pattern to extract: after , at , before , from , on , to . |
Token | Required? | Data Type | Description |
---|
extract | Y | transform | Name of the transform |
col | Y | string | Source column name |
quote | N | string | Specifies a quoted object that is omitted from pattern matching |
ignoreCase | N | boolean | If true , matching is case-insensitive. |
limit | N | integer (positive) | Identifies the number of extractions that can be performed from a single value. Default is 1. |
Matching parameters:
Parameter | Required? | Data Type | Description |
---|
after | N | string | String literal or pattern that precedes the pattern to match |
at | N | Array | Two-integer array identifying the character indexes of start and end characters to match |
before | N | string | String literal or pattern that appears after the pattern to match |
from | N | string | String literal or pattern that identifies the start of the pattern to match |
on | N | string | String literal or pattern that identifies the pattern to match. |
to | N | string | String literal or pattern that identifies the end of the pattern to match |
Identifies the column to which to apply the transform. You can specify only one column.
D code |
---|
extract col: MyCol on: 'MyString' |
Output: Extracts value My String
in new column if it is present in MyCol
. Otherwise, new column value is blank.
Required? | Data Type |
---|
Yes | String (column name) |
D code |
---|
extract col: MyCol after: 'Important:' |
Output: Extracts value in MyCol
that appears after the string Important:
. If the after
value does not appear in the column, the output value is blank.
Include Page |
---|
| after Parameter |
---|
| after Parameter |
---|
|
Required? | Data Type |
---|
No | String or pattern |
- The
after
and from
parameters are very similar. from
includes the matching value as part of the extracted string. after
can be used with either to
, on
, or before
. See Pattern Clause Position Matching.
D code |
---|
extract col: MyCol at: 2,6 |
Output: Extracts contents of MyCol
that starts at the second character in the column and extends to the sixth character of the column.
Required? | Data Type |
---|
No | Array of two Integers ( X,Y ) |
The at
parameter cannot be combined with any of the following: on
, after
, before
, from
, to
, and quote
. See Pattern Clause Position Matching.
D code |
---|
extract col: MyCol before: '|' |
Output: Extracts contents of MyCol
that occur before the pipe character (|
). If the before
value does not appear in the column, the output value is blank.
Include Page |
---|
| before Parameter |
---|
| before Parameter |
---|
|
Required? | Data Type |
---|
No | String or pattern |
- The
before
and to
parameters are very similar. to
includes the matching value as part of the extracted string. before
can be used with either from
, on
, or after
. See Pattern Clause Position Matching .
D code |
---|
extract col: MyCol from: 'go:' |
Output: Extracts contents from MyCol
that occur after go:
, including go:
. If the from
value does not appear in the column, the output value is blank.
Include Page |
---|
| from Parameter |
---|
| from Parameter |
---|
|
Required? | Data Type |
---|
No | String or pattern |
- The
after
and from
parameters are very similar. from
includes the matching value as part of the extracted string. from
can be used with either to
or before
. See Pattern Clause Position Matching .
D code |
---|
extract col: MyCol on: `###ERROR` |
Tip |
---|
Tip: You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/ . For example, /\u0013/ represents Unicode character 0013 (carriage return). For more information, see Supported Special Regular Expression Characters. |
Required? | Data Type |
---|
No | String (literal, regular expression, or ) |
D code |
---|
extract col:MyCol from:'note:' to: `{end}` |
Output: Extracts from MyCol
column all values that begin with note:
up to the end of the value.
Required? | Data Type |
---|
No | String or pattern |
- The
before
and to
parameters are very similar. to
includes the matching value as part of the extracted string. to
can be used with either from
or after
. See Pattern Clause Position Matching.
D code |
---|
extract col: MyCol quote: 'First' after: `{start}%?` |
Output: Extracts each cell value from the MyCol
column, starting at the second character in the cell, as long as the string First
does not appear in the cell.
Include Page |
---|
| quote Parameter |
---|
| quote Parameter |
---|
|
Required? | Data Type |
---|
No | String |
- Parameter value is the quoted object.
- The
quote
value can appear anywhere in the column value. It is not limited by the constraints of any other parameters.
D code |
---|
extract col: MyCol on: 'My String' ignoreCase: true |
Output: Extracts the following values if they appear in the MyCol
column: My String
, my string
, My string
, etc.
Include Page |
---|
| ignoreCase Parameter |
---|
| ignoreCase Parameter |
---|
|
Required? | Data Type |
---|
No | Boolean |
D code |
---|
extract col: MyCol on: 'z' limit: 3 |
Output: Extracts each instance of the letter z
in the MyCol
column into a separate column, generating up to 3 new columns.
Include Page |
---|
| limit Parameter |
---|
| limit Parameter |
---|
|
Required? | Data Type |
---|
No | Integer (positive) |
- Defines the maximum number of columns that can be created by the
extract
transform. - If not specified, exactly one column is created.
Source:
Name |
---|
Mr. Mike Smith |
Dr Jane Jones |
Miss Meg Moon |
Transformation:
The following transformation extracts the second word in the above dataset. Content is extracted after the first space and before the next space.
Tip |
---|
Tip: If you want to break out salutation, first name, and last name at the same time, you should use the Split Column transformation instead. |
D trans |
---|
RawWrangle | true |
---|
p03Value | ' ' |
---|
Type | step |
---|
WrangleText | extract col:Name after: ' ' before: ' ' |
---|
p01Name | Column to extract from |
---|
p01Value | Name |
---|
p02Name | Option |
---|
p02Value | Custom text or pattern |
---|
p03Name | Start extracting after |
---|
p04Value | ' ' |
---|
p04Name | End extracting before |
---|
SearchTerm | Extract text or pattern |
---|
|
Results:
Name | Name2 |
---|
Mr. Mike Smith | Mike |
Dr Jane Jones | Jane |
Miss Meg Moon | Meg |
Source:
The following represents raw log messages extracted from an application. You want to extract the error level for each message: INFO
, WARNING
, or ERROR
.
app_log |
---|
20115-10-30T15:43:37:874Z INFO Client env:started |
20115-10-30T15:43:38:009Z INFO Client env:launched Chromium component |
20115-10-30T15:43:38:512Z ERROR Client env:failed to connect to local DB |
20115-10-30T15:43:38:515Z INFO Client env:launched application |
Transformation:
The text of interest appears after the timestamp and before the message.
- In the
after
clause, a pattern is required. In this case, the selection rule identifies the last segment of the timestamp, with the three pound signs (#
) identifying three digits of unknown value. The "Z " value gives the selection rule an extra bit of specificity. Note the backticks to denote the selection rule. - In the
before
clause, you can use a simple space character string, since it is consistent across all of the data.
D trans |
---|
RawWrangle | true |
---|
p03Value | `###Z ` |
---|
Type | step |
---|
WrangleText | extract col:app_log after: `###Z ` before: ' ' |
---|
p01Name | Column to extract from |
---|
p01Value | app_log |
---|
p02Name | Option |
---|
p02Value | Custom text or pattern |
---|
p03Name | Start extracting after |
---|
p04Value | ' ' |
---|
p04Name | End extracting before |
---|
SearchTerm | Extract text or pattern |
---|
|
Results:
app_log | app_log_2 |
---|
20115-10-30T15:43:37:874Z INFO Client env:started | INFO |
20115-10-30T15:43:38:009Z INFO Client env:launched Chromium component | INFO |
20115-10-30T15:43:38:512Z ERROR Client env:failed to connect to local DB | ERROR |
20115-10-30T15:43:38:515Z INFO Client env:launched application | INFO |
Include Page |
---|
| EXAMPLE - Replacement Transforms |
---|
| EXAMPLE - Replacement Transforms |
---|
|
D s also |
---|
label | wrangle_transform_extract |
---|
|