Page tree

 

Support | BlogContact Us | 844.332.2821

 

Contents:

This documentation applies to Trifacta Wrangler. Download this free product.
Registered users of this product or Trifacta Wrangler Enterprise should login to Product Docs through the application.

 

Contents:


Wrangle is the domain-specific language used to  build transformation recipes in Trifacta Wrangler.

Wrangle recipe is a sequence of transforms, which are applied to your dataset in order to produce your results.

  • transform is a single action applied to your dataset. For most transforms, you can pass one or more parameters to define the context (columns, rows, or conditions) where the transform is applied to your dataset.
  • Within some parameters of a transform, you can specify one or more functions. A function is a computational action performed on one or more columns of data in your dataset.
  • These terms are described below. See General Syntax below.
  • recipes are built in the Transformer Page. See Transformer Page.

When you select suggestions in the Transformer Page, your selection is converted into a Wrangle command and added to your recipe. 

Tip: Where possible, you should make selections in the data grid to build transform steps. These selections prompt a series of cards to be displayed at the bottom of the screen. You can select different cards to specify a basic transform for your selected data, choose a variant of that transform, and then modify the underlying Wrangle recipe as necessary. For more information, see Overview of Predictive Transformation.

For more information on the transform cards, see Transform Cards Panel.

Some complex transforms, such as joins and unions, must be created through dedicated screens. See   Transformer Page .  

Wrangle Syntax 

Wrangle transform steps follow this general syntax:

(transform) param1:(expression) param2:(expression)

Transform ElementDescription
transform

A transform (or verb) is a single keyword that identifies the type of change you are applying to your dataset.

  • A transform is always the first keyword in a recipe step.
  • See Transforms below .

The other elements in each step are contextual parameters for the transform. Some transforms do not require parameters.

parameter1:, parameter2:

Additional parameters may be optional or required for any transform. 

NOTE: A parameter is always followed by a colon. A parameter may appear only one time in a transform step.

Common Parameters

Depending on the transform, one or more of valuecol, and row parameters may be used. For example, the set transform can use all three or just value and col.

Transform ElementDescription
value:

When present, the value parameter defines the expression that creates the output value or values stored when the transform is executed.

An expression can contain combinations of the following:

  • Functions apply computations or evaluations of source data, which can be provided as inputs to the column. Sources may be constants or column references. A function reference is always followed by brackets (), even if it takes no parameters. See Function Categories below.
  • Operators are single-character representations of numeric functions, comparisons, or logical operators. For example, the plus sign (+) is the operator for the add function. See Operator Categories below.
  • Constants can be quoted string literals ('mystring'), Integer values (1001), Decimal values (1001.01), Boolean values (true or false) or pattern. For more information on Trifacta patterns, see Text Matching.
col:

When present, the col parameter identifies the name of the column or columns to which the transform is applied.

Some transforms may support multiple columns as a list, as a range of columns (e.g., column1~column5), or all columns in the dataset (using wildcard indicator, col: *).

row:When present, the row parameter defines the expression to evaluate to determine the rows on which to perform the transform. If the row expression evaluates to true for a row, the transform is performed on the row.

Parameter Inputs

The following types of parameter inputs may be referenced in a transform's parameters. 

Other Trifacta data types can be referenced as column references. For literal values of these data types, you can insert them into your expressions as strings. Transforms cause the resulting values to be re-inferred for their data type.

InputDescriptionExample
column reference

A reference to the values stored in a column in your dataset.

Columns can be referenced by the plain-text value for the column name.

value parameter references the myCol column.

derive value: myCol as:'myNewCol'

IntegerA valid integer value within the accepted range of values for the Integer datatype. For more information, see Supported Data Types.

Generates a column called, my13 which is the sum of the Integer values 5 and 8:

 

derive value: (5 + 8) as:'my13'

DecimalA valid floating point value within the accepted range of values for the Decimal datatype. For more information, see Supported Data Types.

Generates a column of values that computes the approximate circumference of the values in the diameter column:

derive value: (3.14159 * diameter) as: 'circumference'

BooleanA true or false value.

If the value in the order column is more than 1,000,000, then the value in the bigOrder column is true.

derive value:IF(order > 1000000, true, false) as:'bigOrder'

String

A string literal value is the baseline datatype.

String literals must be enclosed in single quotes.

Creates a column called, StringCol containing the value myString.

derive value:'myString' as:'StringCol'

Trifacta pattern

Trifacta Wrangler supports a special syntax, which simplifies the generation of matching patterns for string values.

Patterns must be enclosed in accent marks ( `MyPattern`).

For more information, see Text Matching.

Extracts up to 10 values from the MyData column that match the basic pattern for social security numbers (XXX-XX-XXXX):

extract col: MyData on:`%{3}-%{2}-%{4}` limit:10

regular expression

Regular expressions are a common standard for defining matching patterns. Regex is a very powerful tool but can be easily misconfigured.

Regular expressions must be enclosed in slashes ( /MyPattern/ ).


Deletes all two-digit numbers from the qty column:

replace col: qty on: /^\d$|^\d\d$/ with: '' global: true

Datetime

A valid date or time value that matches the requirements of the Datetime datatype. See Supported Data Types.

Datetime values can be formatted with specific formatting strings. See DATEFORMAT Function.

Generates a new column containing the values from the myDate column reformatted in yyyymmdd format:

derive value:DATEFORMAT(myDate, 'yyyymmdd')

Array

A valid array of values matching the Array data type. Example:

[0,1,2,3,4,5,6,7,8]

See Supported Data Types.

Generates a column with the number of elements in the listed array (7):

derive value: ARRAYLEN('["red", "orange", "yellow", "green", "blue", "indigo", "violet"]')


Map

A valid map of values matching the Map data type. Example:

{"brand":"Subaru","model":"Impreza","color","green"}

See Supported Data Types.

Generates separate columns for each of the specified keys in the map ( brand, model, color), containing the corresponding value for each row:

unnest col:myCol keys:'brand','model','color'


Interactions between Wrangle and the Application

  1. As you enter Wrangle steps into the Transform Editor, your syntax is validated for you. You cannot add steps containing invalid syntax. 
    1. Error messages are reported back to the application, so you can make immediate modifications to correct the issue.
    2. Type-ahead support can provide guidance to the supported transforms, functions, and column references.
    3. For more information, see Transform Editor Panel.
  2. When you have entered a valid transform step in the Transform Editor, the results are previewed for you in the data grid.
    1. This preview is generated by applying the transform to the sample currently displayed in the data grid. 

      NOTE: The generated output applies only to the values displayed in the data grid. The function is applied across the entire dataset only during job execution.

      NOTE: The transform is applied using the currently selected running environment. There may be slight differences in results if you execute the full job on a different running environment.

    2. If the previewed transform is invalid, the data grid is grayed out.
    3. For more information, see Transform Preview.
  3. When you add the transform to your recipe: 
    1. It is applied to the sample in the application, and the data grid is updated to the current state.
    2. Column histograms are updated with new values and counts.
    3. Column data types may be re-inferred for affected columns.
  4. Making changes:
    1. You can edit any transform step in your recipe whenever needed.
      1. When you edit a transform step in your recipe, the context of the data grid is changed to display the state of your data up to the point of previewing the step you're editing. 
      2. All subsequent steps are still part of the recipe, but they are not applied to the sample yet.
      3. You can insert recipe steps between existing steps.
    2. When you delete a recipe step, the state remains at the point where the step was removed.
      1. You can insert a new step if needed.
      2. When you complete your edit, select the final step of the recipe, which displays the results of all of your transform steps in the data grid. Your changes may cause some recipe steps to become invalid.
    3. See Recipe Panel.

Transforms

transform, or verb, is an action applied to rows or columns of your data. Transforms are the essential set of changes that you can apply to your dataset. For more information, see Transforms.

Function Categories

A function is an action that is applied to a set of values as part of a transform step. Functions can apply to the values in a transform for specific data types, such as strings, or to types of transforms, such as Aggregate and Window function categories. A function cannot be applied to data without a transform.

Function CategoryDescription

Aggregate Functions

These functions are used to perform aggregation calculations on your data, such as sum, mean, and standard deviation.
Comparison FunctionsComparison functions enable evaluation between two data elements, which are typically nested (map or array) elements.
Math FunctionsPerform computations on your data using a variety of math functions and numeric operators.
Date FunctionsUse these functions to extract data from or perform operations on objects of Datetime data type.
String FunctionsManipulate strings, including finding sub-strings within a string.
Nested FunctionsThese functions are designed specifically to assist in wrangling nested data, such as maps, arrays, or JSON elements.
Type FunctionsUse the Type functions to identify valid, missing, mismatched, and null values.
Window FunctionsThe Window functions enable you to perform calculations on relative windows of data within your dataset.
Other FunctionsMiscellaneous functions that do not fit into the other categories 

Operator Categories

An operator is a single character that represents an arithmetic function. For example, the Plus sign (+) represents the add function. 

Operator CategoryDescription
Logical Operatorsand, or, and not operators
Numeric OperatorsAdd, subtract, multiply, and divide
Comparison OperatorsCompare two values with greater than, equals, not equals, and less than operators
Ternary OperatorsUse ternary operators to create if/then/else logic in your transforms.

Documentation

Documentation for Wrangle is also available through Trifacta Wrangler. Select Help menu >  Product Docs.

Tip: When searching for examples of transforms and functions, try using the following forms for your search terms within the Product Docs site:

  • Transforms: wrangle_transform_NameOfTransform
  • Functions: wrangle_function_NameOfFunction

All Topics

Topics:

 

Your Rating: Results: PatheticBadOKGoodOutstanding! 3 rates

This page has no comments.