Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

D toc

Excerpt

For many recipes, the first step is to split data from a single column into multiple columns. This section describes the various methods that can be used for splitting a single column into one or more columns, based on character- or pattern-matching or position within the column's values.

Split by Delimiter

When data is initially imported into 

D s product
rtrue
, data in each row may be split on a single delimiter. In the following example, you can see that the tab key is a single clear delimiter:

...

  • When the data is first imported, all of it is contained in a single column named column1. The application automatically splits the columns on the tab character for you and removes the original column1.

    Tip

    Tip: This auto-split does not appear in your recipe by default. For most formats, a set of initial steps is automatically applied to the dataset. Optionally, you can review and modify these steps, but you must deselect Detect Structure during the import. See Initial Parsing Steps.

  • Because the application was unable to determine clear headers for each column's data, generic ones are used. So, before you apply a header to your data, you must split out the data within each column.
  • The delimiters within each column vary.  
    • column2 uses the caret, while column3 uses the forward slash.
    • column4 and column5 use multiple delimiters. 
  • There is sparseness in the data. Note that in column5, the second row contains the value 11 at the end, while the other two data rows do not have this value.

...

Info

NOTE: The Number of columns to create value reflects the total number of new columns to generate.

 

Results:

Below is how the data in column2 is transformed:

...

You can also perform column splits based on numerical positions in column values. These splitting options are useful for highly regular data that is of consistent length.

Tip

Tip: When specifying numeric positions, you do not have to list the positions in numeric order. You can now do faster iteration since you can add new positions as needed when previewing the transformation.

Suppose you have the following coordination information in three dimensions (x, y, and z). Note that the data is very regular, with leading zeroes for values that are less than 1000.

...

The steps used to detect structure are listed as the first steps of your recipe, which allows you to modify them as needed. For more information, see Initial Parsing Steps.See Import Data Page. 

D s also
inCQLtrue
label((label = "structuring_tasks") OR (label = "structure") OR (label = "import"))