is the domain-specific language used to build transformation recipes in .
A recipe is a sequence of transformation steps applied to your dataset in order to produce your results.
Transform and transformation:
A transformation is a user-facing action that you can apply to your dataset through the Transformer page. A transformation is typically a use-specific or more sophisticated manifestation of a transform.
Tip: Except for the reference documentation for individual transforms, the language documentation references transformations that you can apply through the Transformer page.
For most of these actions, you can pass one or more parameters to define the context (columns, rows, or conditions).
Some parameters accept one or more functions. A function is a computational action performed on one or more columns of data in your dataset.
When you select suggestions in the Transformer Page, your selection is converted into a transformation that you can add to your recipe.
Tip: Where possible, you should make selections in the data grid to build transformation steps. These selections prompt a series of cards to be displayed. You can select different cards to specify a basic transformation for your selected data, choose a variant of that transformation, and then modify the underlying recipe as necessary.
NOTE: is not SQL. It is a proprietary language of data transformation, purpose-built for .
While there are some overlaps between and SQL, here are the key distinctions:
transforms follow this general syntax:
(transform) param1:(expression) param2:(expression)
In , a transform (or verb) is a single keyword that identifies the type of change you are applying to your dataset.
A transform is always the first keyword in a recipe step. Details are below.
The other elements in each step are contextual parameters for the transform. Some transforms do not require parameters.
Additional parameters may be optional or required for any transform.
Depending on the transform, one or more of
row parameters may be used. For example, the
set transform can use all three or just
When present, the
An expression can contain combinations of the following:
For more information on , see Text Matching.
When present, the
Some transforms may support multiple columns as a list, as a range of columns (e.g.,
|When present, the |
For aggregating transforms, such as
|For aggregating transforms, such as |
At the flow level, you can define parameters that can be referenced in your recipe steps.
In the following transformation, the flow parameter
currentDiscount is invoked in the step to yield the discounted cost.
Flow parameters are referenced in your steps in the following format:
For more information, see Create Flow Parameter.
The following types of parameter inputs may be referenced in a transform's parameters.
Other can be referenced as column references. For literal values of these data types, you can insert them into your expressions as strings. Transforms cause the resulting values to be re-inferred for their data type.
A reference to the values stored in a column in your dataset. Columns can be referenced by the plain-text value for the column name.
value parameter references the
Column names with spaces or special characters in a transformation must be wrapped by curly braces.
srcColumn is renamed to
src Column, which requires no braces because the new name is captured as a string literal:
NOTE: Current column names that have a space in them must be bracketed in curly braces. The above column name reference is the following:
Some parameters accept functions as inputs. Where values or formulas are calculated, you can reference one of the dozens of functions available in .
supports the use of variable references to aspects of the source data or dataset. In the following example, the ABS function is applied to each column in a set of them using the
$col returns the value from the current column for the row under evaluation.
For more information on these variables, see Source Metadata References.
Individual parameters within a function can be computed expressions themselves. These nested expressions can be calculated using constants, other functions, and column references.
Example: Computes a column whose only value is ten divided by three, rounded to the nearest integer (3):
A valid integer value within the accepted range of values for the Integer datatype.
Example: Generates a column called,
my13 which is the sum of the Integer values
A valid floating point value within the accepted range of values for the Decimal datatype.
Example: Generates a column of values that computes the approximate circumference of the values in the
Example: If the value in the
order column is more than 1,000,000, then the value in the
bigOrder column is
A string literal value is the baseline datatype. String literals must be enclosed in single quotes.
Example: Creates a column called,
StringCol containing the value
supports a special syntax, which simplifies the generation of matching patterns for string values.
Patterns must be enclosed in accent marks (
`MyPattern`). For more information on , see Text Matching.Example: Extracts up to 10 values from the
MyData column that match the basic pattern for social security numbers (
Regular expressions are a common standard for defining matching patterns. Regex is a very powerful tool but can be easily misconfigured.
Regular expressions must be enclosed in slashes (
Example: Deletes all two-digit numbers from the
A valid date or time value that matches the requirements of the Datetime datatype.
Datetime values can be formatted with specific formatting strings. See DATEFORMAT Function.
Example: Generates a new column containing the values from the
myDate column reformatted in
A valid array of values matching the Array data type.
Example: Generates a column with the number of elements in the listed array (
A valid set of values matching the Object data type.
Example: Generates separate columns for each of the specified keys in the object (
color), containing the corresponding value for each row:
For more information on these data types, see Supported Data Types.
This preview is generated by applying the transformation to the sample in the data grid.
NOTE: The generated output applies only to the values displayed in the data grid. The function is applied across the entire dataset only during job execution.
For more information on these areas of the , see Transformer Page.
A transformation is an action for which you can browse or search through the Transform Builder in the Transformer page. When specified and added to the recipe, these sometimes complex actions are rendered in the recipe as steps using the underlying transforms of the language.
Tip: Through transformations, guides you through creation of more sophisticated steps that would be difficult to create in raw .
For more information on the list of available transformations, see Transformation Reference.
For more information on creating transformation steps in the Transformer page, see Transform Builder.
A function is an action that is applied to a set of values as part of a transform step. Functions can apply to the values in a transform for specific data types, such as strings, or to types of transforms, such as aggregate and window function categories. A function cannot be applied to data without a transform.
Below, function inputs are listed in increasing order of generality.
NOTE: A function cannot take a higher-order parameter input type without taking the lower parameter input types. For example, a function cannot take a nested function as an input if it does not accept a literal value, too.
|Order||Parameter input type||Example|
|These functions are used to perform aggregation calculations on your data, such as sum, mean, and standard deviation.|
|Comparison Functions||Comparison functions enable evaluation between two data elements, which are typically nested (Object or Array) elements.|
|Math Functions||Perform computations on your data using a variety of math functions and numeric operators.|
|Trigonometry Functions||Calculate standard trigonometry functions as well as arc versions of them.|
|Date Functions||Use these functions to extract data from or perform operations on objects of Datetime data type.|
|String Functions||Manipulate strings, including finding sub-strings within a string.|
|Nested Functions||These functions are designed specifically to assist in wrangling nested data, such as Objects, Arrays, or JSON elements.|
|Type Functions||Use the Type functions to identify valid, missing, mismatched, and null values.|
|Window Functions||The Window functions enable you to perform calculations on relative windows of data within your dataset.|
|Other Functions||Miscellaneous functions that do not fit into the other categories|
An operator is a single character that represents an arithmetic function. For example, the Plus sign (
+) represents the add function.
|Logical Operators||and, or, and not operators|
|Numeric Operators||Add, subtract, multiply, and divide|
|Comparison Operators||Compare two values with greater than, equals, not equals, and less than operators|
|Ternary Operators||Use ternary operators to create if/then/else logic in your transforms.|
A transform, or verb, is an action applied to rows or columns of your data. Transforms are the essential set of changes that you can apply to your dataset.
Transforms are described in the Language Appendices. For more information, see Transforms.
Documentation for is also available through . Select Help menu > Documentation.
Tip: When searching for examples of functions, try using the following form for your search terms within the Product Documentation site: