Splits the specified column into separate columns of data based on the delimiters in the transform. Delimiters can be specified in a number of literal or pattern-based methods. Whitespace delimiters are supported. |
This transform might be automatically applied as one of the first steps of your recipe. See Initial Parsing Steps.
When the split
transform is applied, the source column is dropped.
derive
transform. See Derive Transform. extract
transform and pattern-based matching. See Extract Transform.split col: MyValues on: ',' limit: 3 |
Output: Splits the source MyValues
column into four separate columns. Values in the columns are determined based on the comma (,
) delimiter. If a row only has two commas in it, then the final generated column is null.
split col:column_ref [quote:'quoted_string'] [ignoreCase:true|false] [limit:int_num] [after:start_point | from: start_point] [before:end_point | to:end_point] [on:'exact_match'] [at:(start_index,end_index)] [delimiters:'string1','string2', 'string3'] [positions: int1,int2,int3] [every:int_num] |
The split
transform supports the following general methods for specifying the delimiters by which to split the column. Depending on your use of the transform, different sets of parameters apply.
Delimiter Methods | Description | Operative Parameters |
---|---|---|
single-pattern delimiters | Column is split based on one of the following: 1) patterns used to describe the beginning and ending of the field delimiter(s), 2) single delimiter, which may be repeated, 3) index values of the start and end points of the delimiter | At least one of the following parameters must be specified:
|
multi-pattern delimiters | Column is split based one of the following literal methods: 1) explicit sequence of delimiters, 2) explicit list of character index positions 3) every N characters Prior to Release 3.2, this method was managed using | At least one of the following parameters must be specified:
|
Shared parameters:
The following parameters are shared between the operating modes:
Token | Required? | Data Type | Description |
---|---|---|---|
split | Y | transform | Name of the transform |
col | Y | string | Source column name |
ignoreCase | N | boolean | If true , matching is case-insensitive. |
limit | N | integer (positive) | Specifies the maximum of columns to split from the source column |
Single-pattern delimiter parameters:
Tip: For this method of matching, at least one of the following parameters must be used: |
Token | Required? | Data Type | Description |
---|---|---|---|
after | N | string | String literal or pattern that precedes the pattern to match |
at | N | Array | Two-integer array identifying the character indexes of start and end characters to match |
before | N | string | String literal or pattern that appears after the pattern to match |
from | N | string | String literal or pattern that identifies the start of the pattern to match |
on | N | string | String literal or pattern that identifies the pattern to match. |
to | N | string | String literal or pattern that identifies the end of the pattern to match |
quote | N | string | Specifies a quoted object that is omitted from pattern matching |
Multi-pattern delimiter parameters:
Tip: Use one of the following parameters for this method of matching. Do not use combinations of them. |
Token | Required? | Data Type | Description |
---|---|---|---|
delimiters | N | array | Array of strings that list the explicit field delimiters in the order to apply them to the column. |
positions | N | array | Array of integers that identify the zero-based character index values where to split the column. |
every | N | integer | String literal or pattern that appears after the pattern to match |
Identifies the column to which to apply the transform. You can specify only one column.
split col: MyCol on: 'MyString' |
Output: Splits the MyCol
column into two separate columns whose values are to the left and right of the MyString
value in each cell.
limit
parameter is not specified, the default value of 1
is applied.Required? | Data Type |
---|---|
Yes | String (column name) |
split col: MyCol on: 'My String' ignoreCase: true |
Output: Splits the MyCol
column on case-insensitive versions of the on
parameter value, if they appear in cell values: My String
, my string
, My string
, etc.
Required? | Data Type |
---|---|
No | Boolean |
split col: MyCol on: 'z' limit: 3 |
Output: Splits the MyCol
column on each instance of the letter z
, generating 4 new columns. If there are fewer than 3 instances of z
in a cell, the corresponding columns after the split are blank.
NOTE: Avoid creating datasets that are wider than 1000 columns. Performance can degrade significantly on even a much more narrow dataset. You should limit yourself to under 500 columns in your dataset. |
Required? | Data Type |
---|---|
No | Integer (positive) |
split
transform. after
matching value. The second column is blank.If the value appears more than once, no additional splitting is made, since there is no other pattern parameter.
NOTE: For |
after
value does not appear in the column, the original column value is written to the first split column.split col: MyCol after: '\' before:'|' |
Output: Splits values in MyCol
based on value between the two characters. The first column contains the part of the MyCol
that appears before the backslash (\
), and the second column contains the part of MyCol
that appears after the pipe character (|
). The content between the delimiting characters is dropped.
Required? | Data Type |
---|---|
No | String or pattern |
after
and from
parameters are very similar. from
includes the matching value as part of the delimiter string.after
can be used with either to
or before
. See Pattern Clause Position Matching.split col: MyCol at: 2,6 |
Output: Splits the MyCol
column on the value that begins at the second character in the column and extends to the sixth character of the column. Contents before the value are in the first column, and contents after the value are in the second column.
Required? | Data Type |
---|---|
No | Array of two Integers ( X,Y ) |
The at
parameter cannot be combined with any of the following: on
, after
, before
, from
, to
, and quote
. See Pattern Clause Position Matching.
before
matching value. If the value appears more than once, no additional splitting is made, since there is no other pattern parameter.
NOTE: For |
before
value does not appear in the column, the original column value is written to the first split column.split col: MyCol before: '/' from:'Go:' |
Output: Splits contents of MyCol
into two columns. The first column contains the values that appear before the Go:
string, and the second column contains the values after the backslash.
Required? | Data Type |
---|---|
No | String or pattern |
before
and to
parameters are very similar. to
includes the matching value as part of the delimiter string.before
can be used with either from
or after
. See Pattern Clause Position Matching .from
matching value. The second column is blank.If the value appears more than once, no additional splitting is made, since there is no other pattern parameter.
NOTE: For |
from
value does not appear in the column, the output value is original column value.split col: MyCol from: 'go:' to:'stop:' |
Output: Splits contents of MyCol
from go:
, including go:
to stop:
, including stop:
. Contents before the string appear in the first column, contents after the string appear in the second one.
Required? | Data Type |
---|---|
No | String or pattern |
after
and from
parameters are very similar. from
includes the matching value as part of the delimiter string.from
can be used with either to
or before
. See Pattern Clause Position Matching .If the value does not appear in the source column, the original value is written to the first column of the split columns.
split col: MyCol on: `###ERROR` |
Output: Column into two columns. The first column contains values in the column appearing before ###ERROR
, and the second column contains the values appearing after this string.
Tip: You can insert the Unicode equivalent character for this parameter value using a regular expression of the form |
Required? | Data Type |
---|---|
No | String or pattern |
to
matching value. If the value appears more than once, no additional splitting is made, since there is no other pattern parameter.
NOTE: For |
to
value does not appear in the column, the original column value is written to the first split column.split col:MyCol from:'note:' to: ` ` |
Output: Splits MyCol
column all contents that appear before note:
in the first column and all contents that appear after the first space after note:
in the second column.
Required? | Data Type |
---|---|
No | String or pattern |
before
and to
parameters are very similar. to
includes the matching value as part of the delimiter string.to
can be used with either from
or after
. See Pattern Clause Position Matching.split col: MyLog on: `|` limit:10 quote: '"' |
Output: Splits the MyLog
column, on the pipe character (|
), while ignoring any pipe characters that are found between double-quote characters in the column. Based on the value in the limit
parameter, the transform is limited to creating a maximum of 10 splits.
Required? | Data Type |
---|---|
No | String |
quote
value can appear anywhere in the column value. It is not limited by the constraints of any other parameters.
The delimiters
parameter specifies a comma-separated list of string literals or patterns to identify the delimiters to use to split the data. Values can be string literals, regular expressions, or .
split col:myCol delimiters:'|',' ','|' |
Output: Splits the myCol
column into four separate columns, as indicated by the sequence of delimiters.
NOTE: Do not use the |
Required? | Data Type |
---|---|
No | Array of Strings (literal, regular expression, |
The positions
parameter specifies a comma-separated list of integers that identify zero-based character index values at which to split the column. Values must be Integers.
split col:myCol positions:20,55,80 |
Output: Splits the myCol
column into four separate columns, where:
column1
= characters 0-20 from the source column, column2
= characters 21-55column3
= characters 56-80column4
= characters 80 to the end of the cell valueNOTE: Do not use the |
Required? | Data Type |
---|---|
No | Array of Integers (literal, regular expression, |
The every
parameter can be used to specify fixed-width splitting of the source column. This Integer value defines the number of characters in each column of the split output.
If needed, you can use the every
parameter with the limit
parameter to define the maximum number of output columns:
split col:myCol every:20 limit:5 |
Output: Splits the myCol
column every 20 characters, with a limit of five splits. The sixth column contains all characters after the 100th character in the cell value.
Required? | Data Type |
---|---|
No | Integer |
When you build or edit a split
transform step in the Transform Builder, you can select one of the following pattern groups to apply to your transform. A pattern group is a set of related patterns that define a method of matching in a cell's data. Some pattern groups apply to multiple transforms, and some apply to the split
transform only. For more information, see Transform Builder.
Source:
ColA | ColB | ColC |
---|---|---|
This my String That | abXcdXefXgh | 01AA001 |
my string This That | ijXklXmnXop | 02BB002 |
This That My String | qrXstXuvXwy | 03CC003 |
Transform:
ColA: You can use the following transform to split on the variations of My String
: In this case, the ignoreCase
parameter ensures that all variations on capitalization are matched:
split col:ColA on:'My String' ignoreCase:true |
ColB: For this column, the letter x
is the split marker, and the data is consistently formatted with three instances per row:
split col:ColB on:'X' limit:3 |
ColC: In this column, the double-letter marker varies between the rows. However, it is consistently in the same location in each row:
split col:ColC at:2,4 |
Results:
When the above transforms are added, the source columns are dropped, leaving the following columns:
ColA1 | ColA2 | ColB1 | ColB2 | ColB3 | ColB4 | ColC1 | ColC2 |
---|---|---|---|---|---|---|---|
This | That | ab | cd | ef | gh | 01 | 001 |
This That | ij | kl | mn | op | 02 | 002 | |
This That | qr | st | uv | wy | 03 | 003 |