Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r094

D toc

Excerpt

For many recipes, the first step is to split data from a single column into multiple columns. This section describes the various methods that can be used for splitting a single column into one or more columns, based on character- or pattern-matching or position within the column's values.

Tip

Tip: When this transform appears in a suggestion card, the maximum number of suggested columns to split is 250, which may prevent the browser from crashing. If your dataset requires additional column splits, you can edit the transformation and increase the maximum number of splits. Avoid creating datasets that are wider than 1000 columns.

...

  • When the data is first imported, all of it is contained in a single column named column1. The application automatically splits the columns on the tab character for you and removes the original column1.

    Tip

    Tip: This auto-split does not appear in your recipe by default. For most formats, a set of initial steps is automatically applied to the dataset. Optionally, you can review and modify these steps, but you must deselect Detect Structure during the import. See Initial Parsing Steps.

  • Because the application was unable to determine clear headers for each column's data, generic ones are used. So, before you apply a header to your data, you must split out the data within each column.
  • The delimiters within each column vary.  
    • column2 uses the caret, while column3 uses the forward slash.
    • column4 and column5 use multiple delimiters. 
  • There is sparseness in the data. Note that in column5, the second row contains the value 11 at the end, while the other two data rows do not have this value.

...

D trans
p03Value'/'
Typestep
p01NameColumn
p01Valuecolumn2
p02NameOption
p02ValueBy delimiter
p03NameDelimiter
p04Value`/(-{digit}|{digit})`
p04NameStart to split after
SearchTermSplit by delimiter

  • The above uses 
    d-s-itemlang
    itempatterns
    , which are simplified versions of regular expressions for matching patterns.
    • In this case, the expression is the following:

      Code Block
      `/(-{digit}|{digit})`
    • For the Start to split after value, the above indicates that the application should start to look for matches on the delimiter (forward slash) only after the above pattern has been detected in the column values.
    • In this case, the pattern describes values that appear after a forward slash and could be a negative digit or a positive digit, which matches the pattern for the TIMEZONE OFFSET values in the column.
    • For more information on how to use 
      d-s-itemlang
      itempatterns
      , see Text Matching.
  • Since you are splitting the column into two columns, you do not need to specify the number of new columns to create. The default is 1.

...

You can also perform column splits based on numerical positions in column values. These splitting options are useful for highly regular data that is of consistent length.

Tip

Tip: When specifying numeric positions, you do not have to list the positions in numeric order. You can now do faster iteration since you can add new positions as needed when previewing the transformation.

Suppose you have the following coordination information in three dimensions (x, y, and z). Note that the data is very regular, with leading zeroes for values that are less than 1000.

...

The steps used to detect structure are listed as the first steps of your recipe, which allows you to modify them as needed. For more information, see Initial Parsing Steps.See Import Data Page. 

D s also
inCQLtrue
label((label = "structuring_tasks") OR (label = "structure") OR (label = "import"))