Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next
Excerpt

This example shows how to use parsing functions for evaluating input values against the function-specific data type.

Functions:

D generate list excerpts
pagesPARSEBOOL Function,PARSEDATE Function,PARSEFLOAT Function,PARSEINT Function

Source:

The following table contains data on a series of races. 

raceIddisqualifieddateracerIdtime_sc
1FALSE2/1/20124.22
2f2/8/20125
3no2/8/20124.11
4n1-Feb-20226.1
5TRUE8-Feb-202.2-25.22
6t2/8/2020  10:16:00 AM225.44
7yes2/1/20324
8y2/8/203329.22
902/8/20324.78
1011-Feb-20426.2.1
11FALSE8-Feb-20
28.22 sec
12FALSE2/8/2020  10:16:00 AM427.11

As you can see, this dataset has variation in values (FALSE, f, no, n) and problems with the data.

Transformation:

When the data is first imported, it may be properly typed for each column. To use the parsing functions, these columns should be converted to String data type:

D trans
RawWrangletrue
Typestep
WrangleTextsettype col: disqualified,date,racerId,time_sc type: String
p01NameColumns
p01Valuedisqualified,date,racerId,time_sc
p02NameNew type
p02ValueString
SearchTermChange column data type

Now, you can parse individual columns. 

disqualified column:

D trans
RawWrangletrue
Typestep
WrangleTextset col: disqualified value: PARSEBOOL($col)
p01NameColumns
p01Valuedisqualified
p02NameFormula
p02ValuePARSEBOOL($col)
SearchTermEdit column with formula

racerId column:

D trans
RawWrangletrue
Typestep
WrangleTextset col: racerId value: PARSEINT($col)
p01NameColumns
p01ValueracerId
p02NameFormula
p02ValuePARSEINT($col)
SearchTermEdit column with formula

time_sc column:

D trans
RawWrangletrue
Typestep
WrangleTextset col: time_sc value: PARSEFLOAT($col)
p01NameColumns
p01Valuetime_sc
p02NameFormula
p02ValuePARSEFLOAT($col)
SearchTermEdit column with formula

date column:

For the date column, the PARSEDATE function supports a default set of Datetime formats. Since some of the listed formats are different from these defaults, you must specify all of the formats. These formats are specified as an array of string values as the second argument of the function:

Tip

Tip: For the PARSEDATE function, it's useful to use the Preview to verify that all of the dates in the column are represented in the array of output formats. You can see the available output formats through the data type menu at the top of a column in the Transformer Page.

D trans
RawWrangletrue
Typestep
WrangleTextset col: date value: PARSEDATE($col, ['yyyy-MM-dd','yyyy\/MM\/dd','M\/d\/yyy hh:mm','MMMM d, yyyy','MMM d, yyyy'])
p01NameColumns
p01Valuedate
p02NameFormula
p02ValuePARSEDATE($col, ['yyyy-MM-dd','yyyy\/MM\/dd','M\/d\/yyy hh:mm','MMMM d, yyyy','MMM d, yyyy'])
SearchTermEdit column with formula

After all of the date values have been standardized to the output format of the PARSEDATE function, you may choose to remove the time element of the values:

D trans
RawWrangletrue
p03Value''
Typestep
WrangleTextreplacepatterns col: date with: '' on: ` {digit}{2}:{digit}{2}:{digit}{2}{end}`
p01NameColumn
p01Valuedate
p02NameFind
p02Value` {digit}{2}:{digit}{2}:{digit}{2}{end}`
p03NameReplace with
SearchTermReplace text or pattern

Results:

After executing the above steps, the data appears as follows. Notes on each column's output are below the table.

raceIddisqualifieddateracerIdtime_sc
1false2020-02-01124.22
2false2020-02-08125
3false2020-02-08124.11
4false2020-02-01226.1
5true2020-02-08null-25.22
6true2020-02-08225.44
7true2020-02-01324
8true2020-02-083329.22
9false2020-02-08324.78
10true2020-02-014null
11false2020-02-08nullnull
12false2020-02-08427.11

disqualified column:

  • The PARSEBOOL function normalizes all valid Boolean values to either false or true.

racerId column:

  • The PARSEINT function writes invalid values as null values.
  • The function writes empty values as null values.
  • The value 33 remains, since it is a valid Integer. This value should be fixed manually.

time_sc:

  • The PARSEFLOAT function writes the source value 25.00 as 25 in output.
  • The source value -25.22 remains. However, since this is time-based data, it needs to be fixed.
  • Invalid values are written as nulls.

date column:

  • All values are written in the standardized format: yyyy-MM-dd HH:mm:ss. Time data has been stripped.

D s also
labelexample_type_parsing_functions