Page tree

 

Contents:


The following changes have been applied to Wrangle in this release.

Release 5.1

Wrangle now supports nested expressions

Beginning in Release 5.1, all Wrangle functions now supported nested expressions, which can be arithmetic calculations, column references, or other function calls.

NOTE: This feature is enabled by default, as this change does not break any steps created in previous versions of the product. It can be disabled if needed. See Miscellaneous Configuration.

NOTE: This capability represents a powerful enhancement to the language, as you can now use dynamic inputs for all functions.

The following expression is a valid transform in Wrangle. It locates the substring in myString that begins with the @ sign until the end of the string, inclusive:

derive value: substring(myString, find(myString, '@', true, 0), length(myString)

Nested arithmetic expressions:

Suppose you wanted just the value after the @ sign until the end of the string. Prior to Release 5.1, the following generated a validation error:

derive value: substring(myString, find(myString, '@', true, 0) + 1, length(myString)

In the above, the addition of +1 to the second parameter is a nested expression and was not supported. Instead, you had to use multiple steps to generate the string value. 

Beginning in Release 5.1, the above single-step transform is supported.

Nested column references:

In addition to arithmetic expressions, you can nested column references. In the following example, the previous step has been modified to replace the static +1 with a reference to a column containing the appropriate value (at_sign_offset) : 

derive value: substring(myString, find(myString, '@', true, 0) + at_sign_offset, length(myString)

Nested function references:

Now, you can combine multiple function references into a single computation. The following computes the total volume of a cube of length side and then multiplies that volume by the number of cubes (cube_count) to compute the total cube_volume

derive type: single value: MULTIPLY(POW(cube_side,3),cube_count) as: 'cube_volume'

For more information, see Wrangle Language.

SOURCEROWNUMBER function generates null values consistently

The SOURCEROWNUMBER function returns the row number of the row as it appears in the original dataset. After some operations, such as unions, joins, and aggregations, this row information is no longer available.

In Release 5.0.1 and earlier, the results were confusing. When source row information was not available, the function was simply not available for use.

In Release 5.1 and later, the behavior of the SOURCEROWNUMBER function is more consistent:

  • If the source row information is available, it is returned.
  • If it is not available:
    • The function can still be used.
    • The function returns null values in all cases.

For more information, see SOURCEROWNUMBER Function.

New Functions

Function NameDescription
ARRAYELEMENTAT FunctionReturns element value of input array for the provided index value.
DOUBLEMETAPHONE FunctionReturns primary and secondary phonetic spellings of an input string using the Double Metaphone algorithm.
DOUBLEMETAPHONEEQUALS FunctionReturns true if two strings match phonetic spellings using Double Metaphone algorithm. Tolerance threshold can be adjusted.
LISTUNIQUE FunctionGenerates a new column containing an array of the unique values from a source column.

Release 5.0.1

RAND function generates true random numbers

In Release 5.0 and earlier, the RAND function produced the same set of random numbers within the browser, after browser refresh, and over subsequent runs of a job. 

  • During job execution, a default seed value was inserted as the basis for the function during the execution of the job. 
  • In some cases, this behavior is desired. 

In Release 5.0.1 and later, the RAND function accepts an optional integer as a parameter. When this new seed value is inserted, the function generates deterministic, pseudo-random values.

  • This version matches the behavior of the old function.

    NOTE: On all upgraded instances of the platform, references to the RAND function have been converted to use a default seed value, so that previous behavior is maintained in the upgraded version.

  • If no seed value is inserted as a parameter, the RAND function generates true random values within the browser, after browser refresh, and over subsequent job runs.

    NOTE: Be aware that modifying your dataset based on the generated values of RAND() may have unpredictable effects later in your recipe and downstream of it.

For more information, see RAND Function.

Release 5.0

Required type parameter

Prior to Release 5.0, the following was a valid Wrangle step:

derive value:colA + colB as:'colC'

Beginning in Release 5.0, the type parameter is required. This parameter defines whether the transform is a single or multi-row formula. In the Transform Builder, this value must be specified.

The following is valid in Release 5.0:

derive type:single value:colA + colB as:'colC'

See Derive Transform.

See Transform Builder.

Deprecated aggregate transform

In Release 4.2.1 and earlier, the aggregate transform could be used to aggregate your datasets using aggregation functions and groupings. 

In Release 5.0 and later, this transform has been merged into the pivot transform. The aggregate transform has been deprecated and is no longer available. 

NOTE: During upgrade to Release 5.0 and later, recipes that had previously used the aggregate transform are automatically migrated to use the pivot equivalent.

Example 1

Release 4.2.1 and earlier Aggregate:

aggregate value:AVERAGE(Scores)

Release 5.0 and later Pivot:

pivot value: AVERAGE(Score) limit: 1

The limit parameter defines the maximum number of columns that can be generated by the pivot.

Example 2

Aggregate:

aggregate value:AVERAGE(Scores) group:studentId

Pivot:

pivot group: StudentId value: AVERAGE(Score) limit: 1

For more information, see Pivot Transform.

New search terms

In the new Search panel, you can search for terms that can be used to select transformations for quick population of parameters. In the following table, you can see Wrangle how terminology has changed in Release 5.0 for some common transforms from earlier release. 

Tip: You can paste the Release 5.0 terms in the Search panel to locate the same transformations used in earlier releases.

Release 4.2.1 and earlier transformsRelease 5.0 and later search terms

aggregate

pivot
keepfilter
deletefilter
extract on:extractpatterns
extract at:extractpositions
extract before:extractbetweendelimiters
extract after:extractbetweendelimiters
replace on:replacepatterns
replace at:replacepositions
replace before:replacebetweenpatterns
replace after:replacebetweenpatterns
replace from:replacebetweenpatterns
replace to:replacebetweenpatterns
split on:splitpatterns
split delimiters:splitpositions
split every:splitpositions
split positions:splitpositions
split after:splitpatterns
split before:splitpatterns
split from:splitpatterns
split to:splitpatterns

Support for <> operator

Prior to Release 5.0, the following operator was used to test "not equal" comparisons:

!=

Beginning in Release 5.0, the following operators is also supported:

<>

Example:

derive value:IF ((col1 <> col2), 'different','equal') as:'testNotEqual'

Tip: Both of the above operators are supported, although the <> operator is preferred.

For more information, see Comparison Operators.

ROUND function takes optional number of digits

The ROUND function now supports rounding to a specified number of digits. By default, values are rounded to the nearest integer, as before. See ROUND Function.

New Functions

Function NameDescription
DEGREES FunctionGenerates the value in degrees for an input radians value.
EXACT FunctionCompares two strings to see if they are exact matches.
FILTEROBJECT FunctionFilters the keys and values from an Object based on specified keys.
HOST FunctionReturns the host value from a URL.
ISEVEN FunctionReturns true if an Integer, function returning an Integer, or a column contains an even value.
ISODD FunctionReturns true if an Integer, function returning an Integer, or a column contains an odd value.
KTHLARGESTUNIQUE FunctionComputes the kth-ranked unique value in a set of values.
LCM FunctionReturns the least common multiple between two input values.
MODE FunctionComputes the mode (most common) value for a set of values.
MODEIF FunctionComputes the mode based on a conditional test.
PAD FunctionPads the left or right side of a value with a specified character string.
PI FunctionGenerates the value for pi to 15 decimal places.
RADIANS FunctionGenerates the value in radians for an input degrees value.
RANDBETWEEN FunctionGenerates a random Integer in a range between two specified values.
RIGHTFIND Function Locates a substring by searching from the right side of an input value.
ROLLINGCOUNTA FunctionComputes count of non-null values across a rolling window within a column.
ROLLINGKTHLARGEST FunctionComputes the kth largest value across a rolling window within a column.
ROLLINGKTHLARGESTUNIQUE FunctionComputes the kth largest unique value across a rolling window within a column.
ROLLINGLIST FunctionComputes list of all values across a rolling window within a column.
ROLLINGMAX FunctionComputes maximum value across a rolling window within a column.
ROLLINGMIN FunctionComputes minimum value across a rolling window within a column.
ROLLINGMODE FunctionComputes mode (most common) value across a rolling window within a column.
ROLLINGSTDEV FunctionComputes standard deviation across a rolling window within a column.
ROLLINGVAR FunctionComputes variance across a rolling window within a column.
SIGN FunctionComputes the positive or negative sign of an input value.
TRUNC FunctionTruncates a value to the nearest integer or a specified number of digits.
URLPARAMS FunctionExtracts any query parameters from a URL into an Object.
WEEKNUM FunctionCalculates the week that the date appears during the year (1-52).

Release 4.2.1

None.

Release 4.2

New Filter transform

Perform a variety of predefined row filtrations using the new filter transform, or apply your own custom formula to keep or delete rows from your dataset. 

New Case transform

Beginning in Release 4.2, you can use the Transform Builder to simplify the construction of CASE statements. For each case, specify the conditional and resulting expression in separate textboxes.

Rename transform now supports multi-column rename

Use the rename transform to rename multiple columns in a single transform. 

Drop specified columns or drop the others

The drop transform now supports the option of dropping all columns except the ones specified in the transform. See Drop Transform.

New string comparison functions

Compare two strings using Latin collation settings. See below.

NOW function returns 24-hour time values

In Release 4.1.1 and earlier, the NOW function returned time values for the specified time zone in 12-hour time, which was confusing.

In Release 4.2 and later, this function returns values in 24-hour time. 

New Transforms

Transform NameDocumentation
caseCase Transform
filterFilter Transform

New Functions

Function NameDocumentation
STRINGGREATERTHAN

STRINGGREATERTHAN Function

STRINGGREATERTHANEQUAL

STRINGGREATERTHANEQUAL Function

STRINGLESSTHAN

STRINGLESSTHAN Function

STRINGLESSTHANEQUAL

STRINGLESSTHANEQUAL Function

SUBSTITUTESUBSTITUTE Function

Release 4.1

Standardization page and transform have been removed

For a number of releases, the Standardization page and its related transform have been available via feature flag, due to a number of issues.

NOTE: This feature was not typically enabled in Trifacta Wrangler Enterprise deployments.

This feature has been removed from the product altogether and will be replaced in the future by a more robust standardization and normalization capability. 

Syntax Changes

New Transforms

Transform NameDocumentationNotes
commentComment Transform

In previous releases, you could insert comments of the following format:

// This is a comment.

Beginning in Release 4.1, comments are supported by formal transform.

New Functions

Function NameDocumentation
CASECASE Function

Release 4.0.1

Map data type is now Object data type

The Map data type has been renamed to the Object data type. There are no changes to the behavior.

NOTE: This change is not reflected in the Release 4.0.1 PDF documentation.

 

Splitrows transform now permits specifying of quote escaping character

For text-based formats that format fields between quotes, you can specify the character that is used to signify the escaping of the quote character in the data. The quoteEscapeChar parameter identifies the character in the data the precedes quotes that are supposed to be part of the data, instead of the marker for a field. See Splitrows Transform.

Release 4.0

Script steps displayed in natural language

In Release 3.2.1 and earlier, the steps of your recipe were displayed in raw form of Wrangle, as in the following example:

split col: column1 on: ',' limit: 5 quote: '\"'

Beginning in Release 4.0, by default, recipe steps are displayed in a more natural form of language, so that you can read the intention of the step without having to understand the details of the underlying language syntax. In natural language format, the above step is rendered as the following:

Split column1 on ',' 5 times

Notes:

  • If you edit a natural language version of your step, you perform your edits in Wrangle
  • If preferred, you can switch back to displaying in source Wrangle. In the data grid, click the Data Grid options button. Select Show Wrangle Script. See Data Grid Panel.
  • Recipe steps are listed in the product documentation in source Wrangle, so that you can copy and paste them into the Transform Builder as needed.

setderive, and window transforms can now perform any type of computation

To support the above capabilities, the following changes appear in the language:

Multi-column input support

  • set and settype transforms now support multiple input columns.
  • When working with multiple columns, set transform now accepts a placeholder variable in the formula. 
  • See Set Transform.
  • See Settype Transform.

Syntax Changes

Terminology Changes

rollingaverage function accepts two windowing parameters.

In Release 3.2.1 and earlier, you could specify your window for computing the rolling average using a single parameter.

window value: ROLLINGAVG(POS_Sales, 3) order: Whse_Nbr

This single parameter determined the row offset after the current row. The above transform captures a window of values from the current row forward two rows for the rolling average value. There was no way to capture a window that included values that were both before and after the current row.

Beginning in Release 4.0, the function accepts an additional parameter, which enables computation across a before/after window. The following example computes the rolling average from two rows before and two rows after the current row:

window value: ROLLINGAVERAGE(POS_Sales, 3, 2) order: Whse_Nbr

Notes:

  • The function name has changed to ROLLINGAVERAGE.
  • The behavior of the first parameter has changed. It captures rows before the current one, instead of rows after the current one.
    • The default values are -1 and 0, which capture all values from the current row back to the first row of the dataset.
  • During the upgrade process, transform steps using this function are automatically migrated to the new method of specification. 
  • For more information, see ROLLINGAVERAGE Function.
Function Changes

The following name changes have been applied to existing functions to use more familiar names.

Old Function NameNew Function NameNotes
ROLLINGAVGROLLINGAVERAGE 

New Datetime functions

These functions generate date and timestamps at execution time:

Function NameDescription
NOW See NOW Function.
TODAYSee TODAY Function.
DATETIMESee DATETIME Function.

New conditional functions

These functions are conditionals based on data validation against a column's data type:

Function NameDescription

IFMISSING

See IFMISSING Function.

IFNULL

See IFNULL Function.

IFVALID

See IFVALID Function.

IFMISMATCHED

See IFMISMATCHED Function.

These functions compute specific values based on conditionals:

Function NameDescription
ANYIFSee ANYIF Function.
AVERAGEIF
See AVERAGEIF Function .
COUNTAIF
See COUNTAIF Function .
COUNTDISTINCTIFSee COUNTDISTINCTIF Function.
COUNTIF
See COUNTIF Function.
KTHLARGESTIFSee KTHLARGESTIF Function.
LISTIF
See .LISTIF Function.
MAXIF
See MAXIF Function.
MINIF
See MINIF Function.
STDEVIFSee STDEVIF Function.
SUMIF
See SUMIF Function.
VARIF
See VARIF Function.

Other new functions

Function NameDescription
COUNTASee COUNTA Function.
ROLLINGSUMSee ROLLINGSUM Function.
ROWNUMBERSee ROWNUMBER Function.
SUFFIXSee SUFFIX Function.
STARTSWITHSee STARTSWITH Function.
ENDSWITHSee ENDSWITH Function.

set
 transform no longer accepts  row  parameter

In Release 3.2.1 and earlier, the row parameter could be used to filter the rows in a dataset to which the set transform value is applied, as in the following example:

set col: results value: 'Outstanding!' row: (score == 100)

Beginning in Release 4.0, the row parameter has been removed. Instead, you can specify conditionals in the value parameter. During upgrade, the above transform step is converted to the following:

set col: results value: IF(score == 100, 'Outstanding!', '')

In addition to the standard IF function, you can apply any of the new conditional functions listed below.

For more information, see Set Transform

Format string with # before 0 is no longer supported in the NUMFORMAT function

In Release 3.2.1, the NUMFORMAT function supported a format string of ##.#0 in the Javascript running environment. This string was not supported in the Photon running environment. 

For Release 4.0 and later, this format string is no longer supported and must be changed.

NOTE: After you have upgraded to Release 4.0 or later, you must change references format strings with a # before 0 for the NUMFORMAT function to use a supported formatting string. See NUMFORMAT Function.

ARRAYUNIQUE function can now take a single column as input

In Release 3.2.1 and earlier, the ARRAYUNIQUE function required at least two functions to generate an output. 

Beginning in Release 4.0, this function can accept a single array or column as input, generating an output array containing only the unique values in the source. See ARRAYUNIQUE Function.

Execution Changes

Ternary predicates evaluating to null return false expressions

Suppose you have the following function expression:

IF(NULL(),1,2)

In Release 3.2.1 and earlier, predicates that returned null values returned null for the entire expression. In this case, the expression returned a null value. 

In Release 4.0 and later, this expression returns 2.

See IF Function.

Null values no longer automatically filtered in limiting transforms

In Release 3.2.1 and earlier, when filtering the set of rows using a recipe step, such as a keep or delete transform, any null values in the evaluated in the condition would result in the filtering being applied. Example:

delete row:invAge >=90

If invAge contained a null value for a row, the row was deleted.

NOTE: In Release 4.0 and later, null values used as inputs to filtering transforms do not result in the row being filtered. This is a change in behavior for null values.

For each of the transforms below, you can review how to retain the Release 3.2.1 and earlier behavior in Release 4.0 and later.

Delete transform:

Release 3.2.1 example:

delete row:invAge >=90

Release 4.0 example:

delete row:(invAge >=90 && invAge == null())

Keep transform:

Release 3.2.1 example:

keep row:POS_Sales < 100

Release 4.0 example:

keep row:(POS_Sales < 100 && POS_Sales != null())

IF transform:

Release 3.2.1 example:

derive value:IF(rating > 9.0, 'ok','retry') as:'status'

Release 4.0 example:

derive value:(IF(rating > 9.0, 'ok','retry') && rating != null()) as:'status'

Tip: Release 4.0 introduces a series of conditional functions that can streamline computation and action. These functions test conditionals based on type (e.g. IFNULL) or based on computation of an aggregate function (e.g. SUMIF). See New conditional functions above.

 For more information:

Release 3.2.1

Syntax Changes

Terminology Changes

NOTE: Beginning in Release 3.2.1, values that are considered empty are now referred to as missing.


Function Changes

The following name changes have been applied to existing functions to use more familiar names.

Old Function NameNew Function NameNotes
MEANAVERAGE 
CEILCEILING 
DATEDIFFDATEDIF 
DAYOFWEEKWEEKDAY 
UPPERCASEUPPER 
LOWERCASELOWER 
PROPERCASEPROPER 
LENGTHLEN 
ARRAYLENGTHARRAYLEN 
CONCATARRAYCONCAT 
CROSSARRAYCROSS 
INTERSECTIONARRAYINTERSECT 
KEYSETKEYS 
UNIQUEARRAYUNIQUE 
ZIPARRAYZIP 
EMPTYISMISSING 
ISEMPTYISMISSINGRemoval of duplicate function name.
MISMATCHEDISMISMATCHED 
VALIDISVALID 
WINDOWFILLFILL 
MODULOMOD 

Execution Changes

Transforms that nest null values in arrays now write null literals on Photon

In Release 3.2 and earlier, when a transform step was nested a null value within an array, an empty string value was written in the Photon running environment.

In Release 3.2.1 and later, the value written for a nested null value in the array is the literal: null.

Suppose your data looks like the following:

text_colempty_str_colnull_col
myText  

where:

  • empty_str_col contains an empty string value.
  • null_col contains a null value

If you add the following recipe step:

nest col: text_col, empty_str_col, null_col into: array as: 'result'

In Release 3.1, the result was the following:

text_colempty_str_colnull_colresult
myText  ["myText", "", ""]

In Release 3.2.1, the result is the following:

text_colempty_str_colnull_colresult
myText  ["myText", "", null]

This change was made to align the behaviors of the Photon running environment with the JavaScript running environment.

Release 3.2

Syntax Changes

Transform Changes

  • multisplit transform has been deprecated. All multisplit capabilities are now supported by the split transform. See Split Transform.
  • pivot transform now supports multiple columns. See Pivot Transform.
  • unnest transform now requires the keys parameter, which was optional in previous releases. See Unnest Transform.
    • To unnest arrays without specifying keys, use the flatten transform. See Flatten Transform.
  • arraylength and arraystomap now accept functions that return arrays as inputs to the function.
  • domain and subdomain functions have been updated to reflect standard interpretations of domain and sub-domain values for URLs:

    ReleaseExample URLDomain FunctionSubdomain Function
    Release 3.1.2 and earlier
    www.exampl.e.org
    exampl.e
    www
    Release 3.2 and later
    www.exampl.e.org
    e
    www.exampl
  • The following parameter values are no longer supported as special capture groups in the with parameter for the replace transform. These references in the with parameter do not work in the Photon and Hadoop running environments and are unlikely to work at scale in any other running environment:

    $&
    $`
    $'

Parameter Changes

Applicable Transform(s)Old TermNew TermNotes
Extract Transformurlparam 

Removed from use in the extract transform. Use of this parameter in that transform prevented the use of other extract parameters. See Extract Transform.

Countpattern Transformquote Removed from use in the countpattern transform. Parameter was not being respected. See Countpattern Transform.
Extractkv Transformquote Remove from use in the extractkv transform. Parameter was not being used. See Extractkv Transform.
Extractlist Transformquote Parameter is now used exclusively for matching against delimiters. Parameter does not match against patterned values. See Extractlist Transform.
Split Transform and Extract Transformlimit 

For these transforms, the limit parameter can no longer be used in conjunction with the following parameters: positions, delimiters, at, and urlparam.

In previous releases, these combinations did not actually work, even though the transform step was consumed. Now, it generates an error.

Function Changes

More consistent results for DATEDIFF functions:

Prior to Release 3.2, the DATEDIFF function generated inconsistent results between the Pig and Javascript running environments for DayOfYear calculations.

Beginning in Release 3.2, the DATEDIFF function has been updated to generate more consistent results. See DATEDIF Function.  

Trifacta Pattern Changes

Changes to alpha-numeric pattern:

The alpha-numeric Trifacta pattern now applies to a single character and does not match on underscores (_). Previously, it was applied to one or more alpha-numeric characters, as well as underscores.

NOTE: Beginning in Release 3.2, the alpha-numeric Trifacta pattern applies to a single character. If you used it in your recipes prior to Release 3.2, these references have been converted to regular expressions to support matching with multiple characters.

New alphanum-underscore pattern:

To support previous functionality, you can use the new alphanum-underscore pattern, which matches on a single alpha-numeric character or underscore. 

For more information, see Text Matching.

RANGE function

The RANGE function now accepts negative start and stop values.

Deprecated Items

Aggregate Tool

In prior releases, you could build aggregation steps using a separate tool, which was available through the Transform Editor.

In Release 3.2, this tool has been replaced by building aggregate transforms in the Transform Builder. See Transform Builder.

Language Cheat Sheet

You can still access the Language Cheat Sheet by adding /docs to the base URL. For example:

http://www.example.com:3005/docs

NOTE: The above option is likely to be deprecated in a future release.

This page has no comments.