Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r092

D toc

D s lang
 is the domain-specific language used to build transformation recipes in
D s product
rtrue
.

A 

D s lang
 recipe is a sequence of transformation steps applied to your dataset in order to produce your results.

Transform and transformation:

  • transform is a single action applied to a dataset. A transform is part of the underlying 
    D s lang
    . Transforms are not directly accessible to users. 
  • transformation is a user-facing action that you can apply to your dataset through the Transformer page. A transformation is typically a use-specific or more sophisticated manifestation of a transform. 

    D s transforms

    Tip

    Tip: Except for the reference documentation for individual transforms, the language documentation references transformations that you can apply through the Transformer page.

For most of these actions, you can pass one or more parameters to define the context (columns, rows, or conditions).

Function:

Some parameters accept one or more functions. A function is a computational action performed on one or more columns of data in your dataset.

When you select suggestions in the Transformer Page, your selection is converted into a transformation that you can add to your recipe.

Tip

Tip: Where possible, you should make selections in the data grid to build transformation steps. These selections prompt a series of cards to be displayed. You can select different cards to specify a basic transformation for your selected data, choose a variant of that transformation, and then modify the underlying

D s lang
recipe as necessary.

D s lang
 vs. SQL

Info

NOTE:

D s lang
is not SQL. It is a proprietary language of data transformation, purpose-built for
D s product
.

While there are some overlaps between 

D s lang
 and SQL, here are the key distinctions:

  • D s lang
     is a proprietary language designed for data transformation. Every supported transformation is designed to make changes to a dataset. It cannot be used to read from or write to a datastore. 
    • Users interact with 
      D s lang
       exclusively through the
      D s webapp
      . There is no direct interaction with the language.
  • SQL (Structured Query Language) is designed for querying, transforming, and writing for relational datasources. It cannot be applied to file-based datasets.
    • SQL cannot be used to transform data in 
      D s product
      .

D s lang
 Syntax

D s lang
 transforms follow this general syntax:

D code

(transform) param1:(expression) param2:(expression)

Transform ElementDescription
transform

In

D s lang
, a transform (or verb) is a single keyword that identifies the type of change you are applying to your dataset.

A transform is always the first keyword in a recipe step. Details are below.

D s transforms

The other elements in each step are contextual parameters for the transform. Some transforms do not require parameters.

parameter1:, parameter2:

Additional parameters may be optional or required for any transform. 

Info

NOTE: A parameter is always followed by a colon. A parameter may appear only one time in a transform step.

Common Parameters

Depending on the transform, one or more of valuecol, and row parameters may be used. For example, the set transform can use all three or just value and col.

Transform ElementDescription
value:

When present, the value parameter defines the expression that creates the output value or values stored when the transform is executed.

An expression can contain combinations of the following:

  • Functions apply computations or evaluations of source data, the outputs of which can become inputs to the column. Sources may be constants or column references. A function reference is always followed by brackets (), even if it takes no parameters. See below.
  • Operators are single-character representations of numeric functions, comparisons, or logical operators. For example, the plus sign (+) is the operator for the add function. See below.
  • Constants can be quoted string literals ('mystring'), Integer values (1001), Decimal values (1001.01), Boolean values (true or false) or patterns.

For more information on

D s lang
itempatterns
, see Text Matching.

col:

When present, the col parameter identifies the name of the column or columns to which the transform is applied.

Some transforms may support multiple columns as a list, as a range of columns (e.g., column1~column5), or all columns in the dataset (using wildcard indicator, col: *).

row:When present, the row parameter defines the expression to evaluate to determine the rows on which to perform the transform. If the row expression evaluates to true for a row, the transform is performed on the row.
group:

For aggregating transforms, such as window, pivot, and derive, the group parameter enables you to calculate aggregation functions within a group value. For example, you can sum sales for each rep by applying group:repName to your transformation.

D s ordering

order:For aggregating transforms, such as window , pivot , and derive , the order parameter can be used to specify the column by which the transform results are sorted. In the previous example, you might choose to sort your sum of sales calculation by state: order:State.


Flow parameters

At the flow level, you can define parameters that can be referenced in your recipe steps. 

In the following transformation, the flow parameter currentDiscount is invoked in the step to yield the discounted cost. 

D trans
RawWrangletrue
p03ValuediscountedCost
Typestep
WrangleTextderive type:single value: TotalCost * (1/${currentDiscount}) as:'discountedCost'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueTotalCost * (1/${currentDiscount})
p03NameNew column name
SearchTermNew formula

Flow parameters are referenced in your steps in the following format:

Code Block
${MyRecipeParameter}

For more information, see Create Flow Parameter.

Parameter Inputs

The following types of parameter inputs may be referenced in a transform's parameters.

Other 

D s item
data types
data types
 can be referenced as column references. For literal values of these data types, you can insert them into your expressions as strings. Transforms cause the resulting values to be re-inferred for their data type.

Column reference

A reference to the values stored in a column in your dataset. Columns can be referenced by the plain-text value for the column name.

Example:  value parameter references the myCol column.

D trans
RawWrangletrue
p03Value'myNewCol'
Typestep
WrangleTextderive type:single value: myCol as:'myNewCol'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValuemyCol
p03NameNew column name
SearchTermNew formula

Column names with spaces or special characters in a transformation must be wrapped by curly braces. 

Example: Below, srcColumn is renamed to src Column, which requires no braces because the new name is captured as a string literal:

D trans
RawWrangletrue
p03Valuesrc Column
Typestep
WrangleTextrename type: manual mapping: [srcColumn, 'src Column']
p01NameOption
p01ValueManual rename
p02NameColumn
p02ValuesrcColumn
p03NameNew column name
SearchTermRename columns

Info

NOTE: Current column names that have a space in them must be bracketed in curly braces. The above column name reference is the following: {src Column}.


Functions

Some parameters accept functions as inputs. Where values or formulas are calculated, you can reference one of the dozens of functions available in

D s lang
.

Example:


D trans
RawWrangletrue
p03Value'six'
Typestep
WrangleTextderive type:single value:MULTIPLY(3,2) as:'six'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueMULTIPLY(3,2)
p03NameNew column name
SearchTermNew formula

Metadata variables

D s lang
 supports the use of variable references to aspects of the source data or dataset. In the following example, the ABS function is applied to each column in a set of them using the $col reference. 

D trans
RawWrangletrue
Typestep
WrangleTextset col: val1,val2 value: ABS($col)
p01NameColumns
p01Valueval1,val2
p02NameFormula
p02ValueABS($col)
SearchTermEdit column with formula

$col returns the value from the current column for the row under evaluation.

For more information on these variables, see Source Metadata References.

Nested expressions

Individual parameters within a function can be computed expressions themselves. These nested expressions can be calculated using constants, other functions, and column references.

Example: Computes a column whose only value is ten divided by three, rounded to the nearest integer (3):

D trans
RawWrangletrue
p03Value'three'
Typestep
WrangleTextderive type:single value:ROUND(DIVIDE(10,3),0) as:'three'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueROUND(DIVIDE(10,3),0)
p03NameNew column name
SearchTermNew formula

Integer

A valid integer value within the accepted range of values for the Integer datatype. 

Example: Generates a column called, my13 which is the sum of the Integer values 5 and 8:

D trans
RawWrangletrue
p03Value'my13'
Typestep
WrangleTextderive type:single value: (5 + 8) as:'my13'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02Value(5 + 8)
p03NameNew column name
SearchTermNew formula

Decimal

A valid floating point value within the accepted range of values for the Decimal datatype. 

Example: Generates a column of values that computes the approximate circumference of the values in the diameter column:

D trans
RawWrangletrue
p03Value'circumference'
Typestep
WrangleTextderive type:single value: (3.14159 * diameter) as: 'circumference'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02Value(3.14159 * diameter)
p03NameNew column name
SearchTermNew formula

Boolean

A true or false value.

Example: If the value in the order column is more than 1,000,000, then the value in the bigOrder column is true.

D trans
RawWrangletrue
p03Value'bigOrder'
Typestep
WrangleTextderive type:single value:IF(order > 1000000, true, false) as:'bigOrder'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueIF(order > 1000000, true, false)
p03NameNew column name
SearchTermNew formula

String

A string literal value is the baseline datatype. String literals must be enclosed in single quotes.

Example: Creates a column called, StringCol containing the value myString.

D trans
RawWrangletrue
p03Value'StringCol'
Typestep
WrangleTextderive type:single value:'myString' as:'StringCol'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02Value'myString'
p03NameNew column name
SearchTermNew formula

D s item
pattern
pattern

D s product
supports a special syntax, which simplifies the generation of matching patterns for string values.

Patterns must be enclosed in accent marks ( `MyPattern`). For more information on

D s lang
itempatterns
, see Text Matching.Example: Extracts up to 10 values from the MyData column that match the basic pattern for social security numbers (XXX-XX-XXXX):

D trans
RawWrangletrue
p03Value`%{3}-%{2}-%{4}`
Typestep
WrangleTextextract col: MyData on:`%{3}-%{2}-%{4}` limit:10
p01NameColumn to extract from
p01ValueMyData
p02NameOption
p02ValueCustom text or pattern
p03NameText to extract
p04Value10
p04NameNumber of matches to extract
SearchTermExtract text or pattern

Regular expression

Regular expressions are a common standard for defining matching patterns. Regex is a very powerful tool but can be easily misconfigured.

Regular expressions must be enclosed in slashes ( /MyPattern/ ).

Example: Deletes all two-digit numbers from the qty column:

D trans
RawWrangletrue
p03Value''
Typestep
WrangleTextreplace col: qty on: /^\d$|^\d\d$/ with: '' global: true
p01NameColumn
p01Valueqty
p02NameFind
p02Value/^\d$|^\d\d$/
p03NameReplace with
p04Valuetrue
p04NameMatch all occurrences
SearchTermReplace text or pattern

Datetime

A valid date or time value that matches the requirements of the Datetime datatype.

Datetime values can be formatted with specific formatting strings. See DATEFORMAT Function.

Example: Generates a new column containing the values from the myDate column reformatted in yyyymmdd format:

D trans
RawWrangletrue
Typestep
WrangleTextderive type:single value:DATEFORMAT(myDate, 'yyyymmdd')
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueDATEFORMAT(myDate, 'yyyymmdd')
SearchTermNew formula

Array

A valid array of values matching the Array data type.

Example:

Code Block
[0,1,2,3,4,5,6,7,8]

Example: Generates a column with the number of elements in the listed array (7):

D trans
RawWrangletrue
Typestep
WrangleTextderive type:single value: ARRAYLEN('["red", "orange", "yellow", "green", "blue", "indigo", "violet"]')
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueARRAYLEN('["red", "orange", "yellow", "green", "blue", "indigo", "violet"]')
SearchTermNew formula

Object

A valid set of values matching the Object data type.

Example:

Code Block
{"brand":"Subaru","model":"Impreza","color","green"}

Example: Generates separate columns for each of the specified keys in the object ( brand, model, color), containing the corresponding value for each row:

D trans
RawWrangletrue
Typestep
WrangleTextunnest col:myCol keys:'brand','model','color'
p01NameColumn
p01ValuemyCol
p02NamePaths to elements
p02Value'brand','model','color'
SearchTermUnnest Objects into columns

For more information on these data types, see Supported Data Types.

Interactions between
D s lang
 and the Application

  1. As you build 
    D s lang
     steps in the Transform Builder, your syntax is validated for you. You cannot add steps containing invalid syntax. 
    1. Error messages are reported back to the application, so you can make immediate modifications to correct the issue.
    2. Type-ahead support can provide guidance to the supported transforms, functions, and column references.
  2. When you have entered a valid transform step, the results are previewed for you in the data grid.
    1. This preview is generated by applying the transformation to the sample in the data grid. 

      Info

      NOTE: The generated output applies only to the values displayed in the data grid. The function is applied across the entire dataset only during job execution.

    2. If the previewed transformation is invalid, the data grid is grayed out.
  3. When you add the transformation to your recipe: 
    1. It is applied to the sample in the application, and the data grid is updated to the current state.
    2. Column histograms are updated with new values and counts.
    3. Column data types may be re-inferred for affected columns.
  4. Making changes:
    1. You can edit any transformation step in your recipe whenever needed.
      1. When you edit a transformation step in your recipe, the context of the data grid is changed to display the state of your data up to the point of previewing the step you're editing. 
      2. All subsequent steps are still part of the recipe, but they are not applied to the sample yet.
      3. You can insert recipe steps between existing steps.
    2. When you delete a recipe step, the state remains at the point where the step was removed.
      1. You can insert a new step if needed.
      2. When you complete your edit, select the final step of the recipe, which displays the results of all of your transformation steps in the data grid. Your changes may cause some recipe steps to become invalid.

For more information on these areas of the

D s webapp
, see Transformer Page.

Transformation

A transformation is an action for which you can browse or search through the Transform Builder in the Transformer page. When specified and added to the recipe, these sometimes complex actions are rendered in the recipe as steps using the underlying transforms of the language. 

Tip

Tip: Through transformations,

D s product
guides you through creation of more sophisticated steps that would be difficult to create in raw
D s lang
.  

For more information on the list of available transformations, see Transformation Reference.

For more information on creating transformation steps in the Transformer page, see Transform Builder.

Functions

A function is an action that is applied to a set of values as part of a transform step. Functions can apply to the values in a transform for specific data types, such as strings, or to types of transforms, such as aggregate and window function categories. A function cannot be applied to data without a transform.

Function input parameters

Below, function inputs are listed in increasing order of generality.

Info

NOTE: A function cannot take a higher-order parameter input type without taking the lower parameter input types. For example, a function cannot take a nested function as an input if it does not accept a literal value, too.


OrderParameter input typeExample
1literal
Code Block
FUNCTION('my input')
2column
Code Block
FUNCTION(myColumnOfValues)
3function
Code Block
FUNCTION(SUM(MyCol))

Function categories

Function CategoryDescription

Aggregate Functions

These functions are used to perform aggregation calculations on your data, such as sum, mean, and standard deviation.
Comparison FunctionsComparison functions enable evaluation between two data elements, which are typically nested (Object or Array) elements.
Math FunctionsPerform computations on your data using a variety of math functions and numeric operators.
Trigonometry FunctionsCalculate standard trigonometry functions as well as arc versions of them.
Date FunctionsUse these functions to extract data from or perform operations on objects of Datetime data type.
String FunctionsManipulate strings, including finding sub-strings within a string.
Nested FunctionsThese functions are designed specifically to assist in wrangling nested data, such as Objects, Arrays, or JSON elements.
Type FunctionsUse the Type functions to identify valid, missing, mismatched, and null values.
Window FunctionsThe Window functions enable you to perform calculations on relative windows of data within your dataset.
Other FunctionsMiscellaneous functions that do not fit into the other categories 

Operator Categories

An operator is a single character that represents an arithmetic function. For example, the Plus sign (+) represents the add function. 

Operator CategoryDescription
Logical Operatorsand, or, and not operators
Numeric OperatorsAdd, subtract, multiply, and divide
Comparison OperatorsCompare two values with greater than, equals, not equals, and less than operators
Ternary OperatorsUse ternary operators to create if/then/else logic in your transforms.

Transforms

 transform, or verb, is an action applied to rows or columns of your data. Transforms are the essential set of changes that you can apply to your dataset.

D s transforms

Transforms are described in the Language Appendices. For more information, see Transforms.

Documentation

Documentation for 

D s lang
 is also available through 
D s product
. Select Help menu > Documentation.

Tip

Tip: When searching for examples of functions, try using the following form for your search terms within the Product Documentation site:

  • Functions: wrangle_function_NameOfFunction

All Topics

D children
alltrue