Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r094
Excerpt

This example shows how you can split data from a single column into multiple columns using delimiters.

  • single-pattern delimiter: One pattern is applied one or more times to the source column to define the delimiters for the output columns
  • multi-pattern delimiter: Multiple patterns, in the form of explicit strings, character index positions, or fixed-width fields, are used to split the column.

Source:

In this example, your CSV dataset contains status messages from a set of servers. In this case, the data about the server and the timestamp is contained in a single value within the CSV.

Code Block
Server|Date Time,Status
admin.examplecom|2016-03-05 07:04:00,down
webapp.examplecom|2016-03-05 07:04:00,ok
admin.examplecom|2016-03-05 07:04:30,rebooting
webapp.examplecom|2016-03-05 07:04:00,ok
admin.examplecom|2016-03-05 07:05:00,ok
webapp.examplecom|2016-03-05 07:05:00,ok

Transformation:

When the data is first loaded into the Transformer page, the CSV data is split using the following two transformations:

D trans
RawWrangletrue
Typestep
WrangleTextsplitrows col: column1 on: '\r'
p01NameColumn
p01Valuecolumn1
p02NameSplit on
p02Value\n
SearchTermSplit into rows

D trans
RawWrangletrue
p03Value','
Typestep
WrangleTextsplit type: on col: column1 on: ',' quote: '\"'
p01NameColumn
p01Valuecolumn1
p02NameOption
p02ValueOn pattern
p03NameMatch pattern
p04Value\"
p04NameIgnore matches between
SearchTermSplit column

You might need to add a header as the first step:

D trans
RawWrangletrue
p03Value1
Typestep
WrangleTextrename type: header method: index sourcerownumber: 1
p01NameOption
p01ValueUse row(s) as column names
p02NameType
p02ValueUse a single row to name columns
p03NameRow number
SearchTermRename column with row(s)

At this point, your data should look like the following:

Server_Date_TimeStatus
admin.example.com|2016-03-05 07:04:00down
webapp.example.com|2016-03-05 07:04:00ok
admin.example.com|2016-03-05 07:04:30rebooting
webapp.example.com|2016-03-05 07:04:30ok
admin.example.com|2016-03-05 07:05:00ok
webapp.example.com|2016-03-05 07:05:00ok

The first column contains three distinct sets of data: the server name, the date, and the time. Note that the delimiters between these fields are different, so you should use a multi-pattern delimiter to break them apart:

D trans
RawWrangletrue
p03Value','
Typestep
WrangleTextsplit type: delimiters col: {Server|Date Time} delimiters: '|',' '
p01NameColumn
p01ValueServer|Date Time
p02NameOption
p02ValueSequence of patterns
p03NamePattern1
p04Value' '
p04NamePattern2
SearchTermSplit column

When the above is added, you should see three separate columns with the individual fields of information. Note that the source column has been automatically dropped.

Now, you decide that it would be useful to break apart the date information column into separate columns for year, month, and day. Since the column delimiter of this field is consistently a dash (-), you can use a single-pattern delimiter with the following transformation:

D trans
RawWrangletrue
p03Value'-'
Typestep
WrangleTextsplitpatterns col: {Server|Date Time2} type: on on: '-' limit: 2
p01NameColumn
p01ValueServer|Date Time2
p02NameOption
p02ValueBy delimiter
p03NameDelimiter
p04Value2
p04NameNumber of columns to create
SearchTermSplit by delimiter

Results:

After you rename the generated columns, your dataset should look like the following. Note that the source timestamp column has been automatically dropped.

 

serveryearmonthdaytimeStatus
admin.example.com2016030507:04:00down
webapp.example.com2016030507:04:00ok
admin.example.com2016030507:04:30rebooting
webapp.example.com2016030507:04:30ok
admin.example.com2016030507:05:00ok
webapp.example.com2016030507:05:00ok


D s also
labelexample_splitting_with_different_delimiter_types