Excerpt |
---|
This example shows how you can split data from a single column into multiple columns using delimiters. |
- single-pattern delimiter: One pattern is applied one or more times to the source column to define the delimiters for the output columns
- multi-pattern delimiter: Multiple patterns, in the form of explicit strings, character index positions, or fixed-width fields, are used to split the column.
Source:
In this example, your CSV dataset contains status messages from a set of servers. In this case, the data about the server and the timestamp is contained in a single value within the CSV.
Code Block |
---|
Server|Date Time,Status
admin.examplecom|2016-03-05 07:04:00,down
webapp.examplecom|2016-03-05 07:04:00,ok
admin.examplecom|2016-03-05 07:04:30,rebooting
webapp.examplecom|2016-03-05 07:04:00,ok
admin.examplecom|2016-03-05 07:05:00,ok
webapp.examplecom|2016-03-05 07:05:00,ok |
Transformation:
When the data is first loaded into the Transformer page, the CSV data is split using the following two transformations:
D trans |
---|
RawWrangle | true |
---|
Type | step |
---|
WrangleText | splitrows col: column1 on: '\r' |
---|
p01Name | Column |
---|
p01Value | column1 |
---|
p02Name | Split on |
---|
p02Value | \n |
---|
SearchTerm | Split into rows |
---|
|
D trans |
---|
RawWrangle | true |
---|
p03Value | ',' |
---|
Type | step |
---|
WrangleText | split type: on col: column1 on: ',' quote: '\"' |
---|
p01Name | Column |
---|
p01Value | column1 |
---|
p02Name | Option |
---|
p02Value | On pattern |
---|
p03Name | Match pattern |
---|
p04Value | \" |
---|
p04Name | Ignore matches between |
---|
SearchTerm | Split column |
---|
|
You might need to add a header
as the first step:
D trans |
---|
RawWrangle | true |
---|
p03Value | 1 |
---|
Type | step |
---|
WrangleText | rename type: header method: index sourcerownumber: 1 |
---|
p01Name | Option |
---|
p01Value | Use row(s) as column names |
---|
p02Name | Type |
---|
p02Value | Use a single row to name columns |
---|
p03Name | Row number |
---|
SearchTerm | Rename column with row(s) |
---|
|
At this point, your data should look like the following:
Server_Date_Time | Status |
---|
admin.example.com|2016-03-05 07:04:00 | down |
webapp.example.com|2016-03-05 07:04:00 | ok |
admin.example.com|2016-03-05 07:04:30 | rebooting |
webapp.example.com|2016-03-05 07:04:30 | ok |
admin.example.com|2016-03-05 07:05:00 | ok |
webapp.example.com|2016-03-05 07:05:00 | ok |
The first column contains three distinct sets of data: the server name, the date, and the time. Note that the delimiters between these fields are different, so you should use a multi-pattern delimiter to break them apart:
D trans |
---|
RawWrangle | true |
---|
p03Value | ',' |
---|
Type | step |
---|
WrangleText | split type: delimiters col: {Server|Date Time} delimiters: '|',' ' |
---|
p01Name | Column |
---|
p01Value | Server|Date Time |
---|
p02Name | Option |
---|
p02Value | Sequence of patterns |
---|
p03Name | Pattern1 |
---|
p04Value | ' ' |
---|
p04Name | Pattern2 |
---|
SearchTerm | Split column |
---|
|
When the above is added, you should see three separate columns with the individual fields of information. Note that the source column has been automatically dropped.
Now, you decide that it would be useful to break apart the date information column into separate columns for year, month, and day. Since the column delimiter of this field is consistently a dash (-
), you can use a single-pattern delimiter with the following transformation:
D trans |
---|
RawWrangle | true |
---|
p03Value | '-' |
---|
Type | step |
---|
WrangleText | splitpatterns col: {Server|Date Time2} type: on on: '-' limit: 2 |
---|
p01Name | Column |
---|
p01Value | Server|Date Time2 |
---|
p02Name | Option |
---|
p02Value | By delimiter |
---|
p03Name | Delimiter |
---|
p04Value | 2 |
---|
p04Name | Number of columns to create |
---|
SearchTerm | Split by delimiter |
---|
|
Results:
After you rename the generated columns, your dataset should look like the following. Note that the source timestamp column has been automatically dropped.
server | year | month | day | time | Status |
---|
admin.example.com | 2016 | 03 | 05 | 07:04:00 | down |
webapp.example.com | 2016 | 03 | 05 | 07:04:00 | ok |
admin.example.com | 2016 | 03 | 05 | 07:04:30 | rebooting |
webapp.example.com | 2016 | 03 | 05 | 07:04:30 | ok |
admin.example.com | 2016 | 03 | 05 | 07:05:00 | ok |
webapp.example.com | 2016 | 03 | 05 | 07:05:00 | ok |
D s also |
---|
label | example_splitting_with_different_delimiter_types |
---|
|