Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

D toc

Excerpt

Splits the specified column into separate columns of data based on the delimiters in the transform. Delimiters can be specified in a number of literal or pattern-based methods. Whitespace delimiters are supported. 

This transform might be automatically applied as one of the first steps of your recipe. See Initial Parsing Steps.

When the split transform is applied, the source column is dropped. 

  • Before applying this transform, you can create a copy of the source column using the derive transform. See Derive Transform
  • To retain the source column, you can use the extract transform and pattern-based matching. See Extract Transform.

D s
snippetBasic

D code
split col: MyValues on: ',' limit: 3

Output: Splits the source MyValues column into four separate columns. Values in the columns are determined based on the comma (,) delimiter. If a row only has two commas in it, then the final generated column is null.

D s
snippetParameters
D code

split col:column_ref [quote:'quoted_string'] [ignoreCase:true|false] [limit:int_num] [after:start_point | from: start_point] [before:end_point | to:end_point] [on:'exact_match'] [at:(start_index,end_index)] [delimiters:'string1','string2', 'string3'] [positions: int1,int2,int3] [every:int_num]

D s lang notes

The split transform supports the following general methods for specifying the delimiters by which to split the column. Depending on your use of the transform, different sets of parameters apply.

Delimiter MethodsDescriptionOperative Parameters
single-pattern delimiters

Column is split based on one of the following:

1) patterns used to describe the beginning and ending of the field delimiter(s),

2) single delimiter, which may be repeated,

3) index values of the start and end points of the delimiter

quote

At least one of the following parameters must be specified:

after , at , from , before , on , to

multi-pattern delimiters

Column is split based one of the following literal methods:

1) explicit sequence of delimiters,

2) explicit list of character index positions

3) every N characters

Prior to Release 3.2, this method was managed using multisplit , which was a separate transform.

At least one of the following parameters must be specified:

delimiters, positions , every

Shared parameters:

The following parameters are shared between the operating modes:

TokenRequired?Data TypeDescription
splitYtransformName of the transform
colYstringSource column name
ignoreCaseNbooleanIf true, matching is case-insensitive.
limitNinteger (positive)Specifies the maximum of columns to split from the source column

Single-pattern delimiter parameters:

Tip

Tip: For this method of matching, at least one of the following parameters must be used: at, before, from, on, or to.

TokenRequired?Data TypeDescription
afterNstringString literal or pattern that precedes the pattern to match
atNArrayTwo-integer array identifying the character indexes of start and end characters to match
beforeNstringString literal or pattern that appears after the pattern to match
fromNstringString literal or pattern that identifies the start of the pattern to match
onNstringString literal or pattern that identifies the pattern to match.
toNstringString literal or pattern that identifies the end of the pattern to match
quoteNstringSpecifies a quoted object that is omitted from pattern matching

Multi-pattern delimiter parameters:

Tip

Tip: Use one of the following parameters for this method of matching. Do not use combinations of them.

TokenRequired?Data TypeDescription
delimitersNarrayArray of strings that list the explicit field delimiters in the order to apply them to the column.
positionsNarrayArray of integers that identify the zero-based character index values where to split the column.
everyNintegerString literal or pattern that appears after the pattern to match

col

Identifies the column to which to apply the transform. You can specify only one column.

D code

split col: MyCol on: 'MyString'

Output: Splits the MyCol column into two separate columns whose values are to the left and right of the MyString value in each cell.

  • If a delimiter value is not detected, the cell value appears in the first of the new columns.
  • When the limit parameter is not specified, the default value of 1 is applied.

D s
snippetusage

Required?Data Type
YesString (column name)

ignoreCase

Include Page
ignoreCase Parameter
ignoreCase Parameter

D code

split col: MyCol on: 'My String' ignoreCase: true

Output: Splits the MyCol column on case-insensitive versions of the on parameter value, if they appear in cell values: My Stringmy stringMy string, etc.

D s
snippetusage

Required?Data Type
NoBoolean

limit

Include Page
limit Parameter
limit Parameter

D code

split col: MyCol on: 'z' limit: 3

Output: Splits the MyCol column on each instance of the letter z, generating 4 new columns. If there are fewer than 3 instances of z in a cell, the corresponding columns after the split are blank.

D s
snippetusage

Required?Data Type
NoInteger (positive)
  • Defines the number of columns that can be created by the split transform. 
  • If not specified, exactly one column is created. See Pattern Clause Position Matching.

after

Include Page
after Parameter
after Parameter

  • If this parameter is the only pattern describer: 
    • The column is split into two. The first column contains the part of the source column before the after matching value. The second column is blank.
    • If the value appears more than once, no additional splitting is made, since there is no other pattern parameter.

      Info

      NOTE: For after, before, from, and to, matching occurs only one time at most. Additional instances of the parameter's value in the cell do not cause another column split. For more predictable results, you should specify another pattern parameter.

  • If the after value does not appear in the column, the original column value is written to the first split column.
  • This parameter is typically used with another to describe a field delimiting pattern. See below.
D code

split col: MyCol after: '\' before:'|'

Output: Splits values in MyCol based on value between the two characters. The first column contains the part of the MyCol that appears before the backslash (\), and the second column contains the part of MyCol that appears after the pipe character (|). The content between the delimiting characters is dropped.

D s
snippetusage

Required?Data Type
NoString or pattern
  • The after and from parameters are very similar. from includes the matching value as part of the delimiter string.
  • after can be used with either to or before. See  Pattern Clause Position Matching.

at

Include Page
at Parameter
at Parameter

D code

split col: MyCol at: 2,6

Output: Splits the MyCol column on the value that begins at the second character in the column and extends to the sixth character of the column. Contents before the value are in the first column, and contents after the value are in the second column.

D s
snippetusage

Required?Data Type
NoArray of two Integers ( X,Y )

The at parameter cannot be combined with any of the following:  onafter, before, from, to, and quote. See  Pattern Clause Position Matching.

before

Include Page
before Parameter
before Parameter

  • If this parameter is the only pattern describer: 
    • The column is split into two. The first column is blank. The second column contains the part of the source column after the before matching value. 
    • If the value appears more than once, no additional splitting is made, since there is no other pattern parameter.

      Info

      NOTE: For after, before, from, and to, matching occurs only one time at most. Additional instances of the parameter's value in the cell do not cause another column split. For more predictable results, you should specify another pattern parameter.

  • If the before value does not appear in the column, the original column value is written to the first split column.
  • This parameter is typically used with another to describe a field delimiting pattern. See below.
D code

split col: MyCol before: '/' from:'Go:'

Output: Splits contents of MyCol into two columns. The first column contains the values that appear before the Go: string, and the second column contains the values after the backslash. 

D s
snippetusage

Required?Data Type
NoString or pattern
  • The before and to parameters are very similar.  to includes the matching value as part of the delimiter string.
  • before can be used with either from or afterSee  Pattern Clause Position Matching .

from

Include Page
from Parameter
from Parameter

  • If this parameter is the only pattern describer: 
    • The column is split into two. The first column contains the part of the source column before the from matching value. The second column is blank.
    • If the value appears more than once, no additional splitting is made, since there is no other pattern parameter.

      Info

      NOTE: For after, before, from, and to, matching occurs only one time at most. Additional instances of the parameter's value in the cell do not cause another column split. For more predictable results, you should specify another pattern parameter.

  • If the from value does not appear in the column, the output value is original column value.
  • This parameter is typically used with another to describe a field delimiting pattern. See below.
D code

split col: MyCol from: 'go:' to:'stop:'

Output: Splits contents of MyCol from go:, including go: to stop:, including stop:. Contents before the string appear in the first column, contents after the string appear in the second one. 

D s
snippetusage

Required?Data Type
NoString or pattern
  • The after and from parameters are very similar.  from includes the matching value as part of the delimiter string.
  • from can be used with either to or beforeSee  Pattern Clause Position Matching .

on

Include Page
on Parameter
on Parameter

If the value does not appear in the source column, the original value is written to the first column of the split columns.

D code

split col: MyCol on: `###ERROR`

Output: Column into two columns. The first column contains values in the column appearing before ###ERROR, and the second column contains the values appearing after this string.

Tip

Tip: You can insert the Unicode equivalent character for this parameter value using a regular expression of the form /\uHHHH/. For example, /\u0013/ represents Unicode character 0013 (carriage return). For more information, see Supported Special Regular Expression Characters.

 

D s
snippetusage

Required?Data Type
NoString or pattern

to

Include Page
to Parameter
to Parameter

  • If this parameter is the only pattern describer: 
    • The column is split into two. The first column is blank. The second column contains the part of the source column after the to matching value. 
    • If the value appears more than once, no additional splitting is made, since there is no other pattern parameter.

      Info

      NOTE: For after, before, from, and to, matching occurs only one time at most. Additional instances of the parameter's value in the cell do not cause another column split. For more predictable results, you should specify another pattern parameter.

  • If the to value does not appear in the column, the original column value is written to the first split column.
  • This parameter is typically used with another to describe a field delimiting pattern. See below.
D code

split col:MyCol from:'note:' to: ` `

Output: Splits MyCol column all contents that appear before note: in the first column and all contents that appear after the first space after note: in the second column. 

D s
snippetusage

Required?Data Type
NoString or pattern
  • The before and to parameters are very similar.  to includes the matching value as part of the delimiter string.
  • to can be used with either from or after. See Pattern Clause Position Matching.

quote

Include Page
quote Parameter
quote Parameter

D code

split col: MyLog on: `|` limit:10 quote: '"'

Output: Splits the MyLog column, on the pipe character (|), while ignoring any pipe characters that are found between double-quote characters in the column. Based on the value in the limit parameter, the transform is limited to creating a maximum of 10 splits. 

D s
snippetusage

Required?Data Type
NoString

 

  • Parameter value is the quoted object.
  • The quote value can appear anywhere in the column value. It is not limited by the constraints of any other parameters.

 

delimiters

The delimiters parameter specifies a comma-separated list of string literals or patterns to identify the delimiters to use to split the data. Values can be string literals, regular expressions, or 

D s item
itempatterns
.

  • The sequence of values defines the order in which the delimiters are applied.
  • Values do not need to be the same.

D code

split col:myCol delimiters:'|',' ','|'

Output: Splits the myCol column into four separate columns, as indicated by the sequence of delimiters.

D s
snippetusage

Info

NOTE: Do not use the limit or quote parameters with the delimiters parameter.

Required?Data Type
No

Array of Strings (literal, regular expression,

D s item
pattern
pattern
)

positions

The positions parameter specifies a comma-separated list of integers that identify zero-based character index values at which to split the column. Values must be Integers.

D code

split col:myCol positions:20,55,80

Output: Splits the myCol column into four separate columns, where:

  • column1 = characters 0-20 from the source column, 
  • column2 = characters 21-55
  • column3 = characters 56-80
  • column4 = characters 80 to the end of the cell value

D s
snippetusage

Info

NOTE: Do not use the limit or quote parameters with the positions parameter.

Required?Data Type
No

Array of Integers (literal, regular expression,

D s item
pattern
pattern
)

every

The every parameter can be used to specify fixed-width splitting of the source column. This Integer value defines the number of characters in each column of the split output.

If needed, you can use the every parameter with the limit parameter to define the maximum number of output columns:

D code

split col:myCol every:20 limit:5

Output: Splits the myCol column every 20 characters, with a limit of five splits. The sixth column contains all characters after the 100th character in the cell value.

D s
snippetusage

Required?Data Type
NoInteger

Pattern Groups

When you build or edit a split transform step in the Transform Builder, you can select one of the following pattern groups to apply to your transform. A pattern group is a set of related patterns that define a method of matching in a cell's data. Some pattern groups apply to multiple transforms, and some apply to the split transform only. For more information, see Transform Builder.

D s
snippetExamples

Example - Split with single pattern delimiters

Source:

ColAColBColC
This my String ThatabXcdXefXgh01AA001
my string This ThatijXklXmnXop02BB002
This That My StringqrXstXuvXwy03CC003

Transform:

ColA: You can use the following transform to split on the variations of My String: In this case, the ignoreCase parameter ensures that all variations on capitalization are matched:

D code

split col:ColA on:'My String' ignoreCase:true

ColB: For this column, the letter x is the split marker, and the data is consistently formatted with three instances per row:

D code

split col:ColB on:'X' limit:3

ColC: In this column, the double-letter marker varies between the rows. However, it is consistently in the same location in each row:

D code

split col:ColC at:2,4

Results:

When the above transforms are added, the source columns are dropped, leaving the following columns:

ColA1ColA2ColB1ColB2ColB3ColB4ColC1ColC2
ThisThatabcdefgh01001
 This Thatijklmnop02002
This That qrstuvwy03003

Example - Split with quoted values

Include Page
EXAMPLE - Quote Parameter
EXAMPLE - Quote Parameter

Example - Splitting with different delimiter types

Include Page
EXAMPLE - Splitting with Different Delimiter Types
EXAMPLE - Splitting with Different Delimiter Types

D s also
labelwrangle_transform_split