Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0682

D toc

D s transforms

Excerpt
Sets the data type of the specified column or columns. The column data is validated against the new data type, which can change the results of column profiling.

Type is specified as a string literal or comma-separated set of literals. For more information on valid string literals, see Valid Data Type Strings .

Tips:

Tip

Tip: You can use the settype transform to override the data type inferred for a column. However, if a new transformation step is added, the column data type is re-inferred, which may override your specific typing. You should consider applying setttype transforms as late as possible in your recipes.

  • When a column is set to a data type, all values in the column are validated against the new type, which might change the number of mismatched values. Some cleanup might be required. Some operations might cause the data type to be re-validated automatically.
  • It might be easier to set type using the column's drop-down. Selections of data type from the column drop-down are turned into recipe steps using the settype transform.
  • If you encounter a significant number of mismatches after you change the data type, you might find it helpful to change or revert the type to String. All data can be interpreted as a String or a list of string values. The transforms and functions for manipulating String data might be easier to use to clean up mismatched data before changing the data type to the preferred one.
  • Row values that do not match the new data type might be turned to null values during job execution.

D s
snippetBasic

Single-column example:

D code
settype col: Score type: 'Integer'

Output: Changes the data type for the Score column to Integer.

Multi-column example:

D code
settype col: Score,studentId type: 'Integer'

Output: Changes the data type for the Score and studentId columns to Integer.

D s
snippetParameters

D code

settype col:col1,col2 type:'string_literal'

TokenRequired?Data TypeDescription
settypeYtransformName of the transform
colYstringComma-separated list of columns to which to apply the specified type.
typeYstringString literal identifying the data type to apply to the column(s). See Valid Data Type Strings.

D s lang notes

col

Identifies the column(s) to which to apply the transform. You can specify one or more columns.

D s
snippetusage

Required?Data Type
YesComma-separated strings (column name or names)

type

Include Page
type Parameter
type Parameter

D code

settype col: zips type:'Zipcode'

Output: Changes the data type of the zips column to Zip Code data type. All values are validated as U.S. Zip code.

D s
snippetusage

Required?Data Type
YesString value

D s
snippetExamples

Example - Simple settype with date values

Source:

Here is a list of activities listed by date. Note the variation in date values, including what is clearly an invalid date. Here is the source data:

Code Block
myDate, myAction
4/4/2016,Woke up at 6:30
4-4-2016,Got ready
9-9-9999,Drove kids to school
4-4-2016, Commuted to work


Transformation:

When this data is imported into the Transformer page, there are couple of immediate issues: no column headings and blank rows at the bottom. These two transformations fix that:

D trans
RawWrangletrue
p03Value1
Typestep
WrangleTextrename type: header method: index sourcerownumber: 1
p01NameOption
p01ValueUse row(s) as column names
p02NameType
p02ValueUse a single row to name columns
p03NameRow number
SearchTermRename column with row(s)

D trans
RawWrangletrue
p03Valueismissing([myDate])
Typestep
WrangleTextdelete row: ismissing([myDate])
p01NameCondition
p01ValueCustom formula
p02NameType of formula
p02ValueCustom single
p03NameCondition
p04ValueDelete matching rows
p04NameAction
SearchTermFilter rows

For the invalid date, you can infer from the rows around it that it should be from the same date. You can make the following change to fix it:

D trans
RawWrangletrue
p03Value'4-4-2016'
Typestep
WrangleTextreplace col: myDate on: `9-9-9999` with: '4-4-2016' global: true
p01NameColumn
p01ValuemyDate
p02NameFind
p02Value`9-9-9999`
p03NameReplace with
p04Valuetrue
p04NameMatch all occurrences
SearchTermReplace text or pattern

Now that the dates look fairly consistent, you can set the data type of the column to a matching Datetime format:

D trans
RawWrangletrue
p03Value'mm-dd-yy','mm*dd*yyyy'
Typestep
WrangleTextsettype col: myDate type: 'Datetime','mm-dd-yy','mm*dd*yyyy'
p01NameColumns
p01ValuemyDate
p02NameNew type
p02ValueCustom or Date/Time
p03NameSpecify type
SearchTermChange column data type

Note the syntax above for specifying Datetime types. In addition to the Datetime keyword, you must specify the format type, followed by the variation of that format.

Tip

Tip: A set of supported formats and variations for Datetime are available through the column data type selector. When you select your desired Datetime format, the setttype transform is added to your recipe.

Results:

myDatemyAction
4/4/2016Woke up at 6:30
4-4-2016Got ready
4-4-2016Drove kids to school
4-4-2016Commuted to work

Example - Use merge and settype to clean up numeric data that should be treated as other data types

Include Page
EXAMPLE - Settype Transform
EXAMPLE - Settype Transform

D s also
labelwrangle_transform_settype