The following changes have been applied to in this release of .
Broader support for metadata references: For Excel files,
$filepath references now return the location of the source Excel file. Sheet names are appended to the end of the reference. See Source Metadata References.
The DateFormat task now supports a new parameter: Input Formats. This parameter specifies the date format to use when attempting to parse the input column.
(default) if the parameter is not specified, then the following common formats are used for parsing the input:
'M/d/yy', 'MM/dd/yy', 'MM-dd-yy', 'M-d-yy', 'MMM d, yyyy', 'MMMM d, yyyy', 'EEEE, MMMM d, yyyy', 'MMM d yyyy', 'MMMM d yyyy', 'MM-dd-yyyy', 'M-d-yyyy', 'yyyy-MM-ddXXX', 'dd/MM/yyyy', 'd/M/yyyy', 'MM/dd/yyyy', 'M/d/yyyy', 'yyyy/M/d', 'M/d/yy h:mm a', 'MM/dd/yy h:mm a', 'MM-dd-yy h:mm a', 'MMM dd yyyy HH.MM.SS xxx', 'M-d-yy h:mm a', 'MMM d, yyyy h:mm:ss a', 'EEEE, MMMM d, yyyy h:mm:ss a X', 'EEE MMM dd HH:mm:ss X yyyy', 'EEE, d MMM yyyy HH:mm:ss X', 'd MMM yyyy HH:mm:ss X', 'MM-dd-yyyy h:mm:ss a', 'M-d-yyyy h:mm:ss a', 'yyyy-MM-dd h:mm:ss a', 'yyyy-M-d h:mm:ss a', 'yyyy-MM-dd HH:mm:ss.S', 'dd/MM/yyyy h:mm:ss a', 'd/M/yyyy h:mm:ss a', 'MM/dd/yyyy h:mm:ss a', 'M/d/yyyy h:mm:ss a', 'MM/dd/yy h:mm:ss a', 'MM/dd/yy H:mm:ss', 'M/d/yy H:mm:ss', 'dd/MM/yyyy h:mm a', 'd/M/yyyy h:mm a', 'MM/dd/yyyy h:mm a', 'M/d/yyyy h:mm a', 'MM-dd-yy h:mm:ss a', 'M-d-yy h:mm:ss a', 'MM-dd-yyyy h:mm a', 'M-d-yyyy h:mm a', 'yyyy-MM-dd h:mm a', 'yyyy-M-d h:mm a', 'MMM.dd.yyyy', 'd/MMM/yyyy H:mm:ss X', 'dd/MMM/yy h:mm a',
These formats are a subset of the date formatting strings supported by the product. For more information, see Datetime Data Type.
In previous releases, the ARRAYELEMENTAT function accepted a second input parameter to specify the index value of the element to retrieve. This "at" parameter had to be an Integer literal.
Beginning in this release, the function also accepts for this second "at" parameter:
For more information, see ARRAYELEMENTAT Function.
The following LIST-based functions have been changed to narrow the accepted input data types. In previous releases, any data type was accepted for input, which was not valid for most data types.
In Release 6.0 and later, these functions accept only Array inputs. Inputs can be Array literals, a column of Arrays, or a function returning Arrays.
NOTE: You should references to these functions in your recipes.
The following functions have been renamed in Release 6.0.
|Release 5.9 and earlier||Release 6.0 and later|
|LISTUNIQUE Function||UNIQUE Function|
Prior to Release 6.0, the FILL function replaced empty cells with the most recent non-empty value.
In Release 6.0,
after function parameters have been added. These parameters define the window of rows before and after the row being tested to search for non-empty values. Within this window, the most recent non-empty value is used.
The default values for these parameters are
0 respectively, which performs a search of an unlimited number of preceding rows for a non-empty value.
NOTE: Upon upgrade, the FILL function retains its preceding behavior, as the default values for the new parameters perform the same unlimited row search for non-empty values.
For more information, see FILL Function.
The following functions can now be applied directly to arrays to derive meaningful statistics about them.
Beginning in Release 5.8, you can insert the following references into the formulas of your transformations. These source metadata references enable you to continue to track file lineage information from within your datasets as part of your wrangling project.
NOTE: These references apply only to file-based sources. Some additional limitations may apply.
|Returns the full path and filename of the source of the dataset.|
Returns the row number for the current row from the original source of the dataset.
For more information, see Source Metadata References.
The following functions can now be applied directly to arrays to derive meaningful statistics about them.
In Release 5.6 and earlier, the WEEKNUM function treated the first week of the year differently between the and Spark running environments:
This issue was caused by Spark following an ISO-8601 standard and relying on the joda datetimeformatter.
Beginning in Release 5.7, the WEEKNUM function behaves consistently for both and Spark:
For more information, see WEEKNUM Function.
In Release 5.1 and earlier, the URLPARAMS function returned empty Objects when no answer was computed for the function.
In Release 5.6 and later, this function returns null values in the above case.
See URLPARAMS Function.
Beginning in Release 5.1, all functions now supported nested expressions, which can be arithmetic calculations, column references, or other function calls.
NOTE: This feature is enabled by default, as this change does not break any steps created in previous versions of the product. It can be disabled if needed. See Miscellaneous Configuration.
NOTE: This capability represents a powerful enhancement to the language, as you can now use dynamic inputs for all functions.
The following expression is a valid transform in . It locates the substring in
myString that begins with the
@ sign until the end of the string, inclusive:
derive value: substring(myString, find(myString, '@', true, 0), length(myString)
Nested arithmetic expressions:
Suppose you wanted just the value after the
@ sign until the end of the string. Prior to Release 5.1, the following generated a validation error:
derive value: substring(myString, find(myString, '@', true, 0) + 1, length(myString)
In the above, the addition of
+1 to the second parameter is a nested expression and was not supported. Instead, you had to use multiple steps to generate the string value.
Beginning in Release 5.1, the above single-step transform is supported.
Nested column references:
In addition to arithmetic expressions, you can nested column references. In the following example, the previous step has been modified to replace the static
+1 with a reference to a column containing the appropriate value (
derive value: substring(myString, find(myString, '@', true, 0) + at_sign_offset, length(myString)
Nested function references:
Now, you can combine multiple function references into a single computation. The following computes the total volume of a cube of length
side and then multiplies that volume by the number of cubes (
cube_count) to compute the total
derive type: single value: MULTIPLY(POW(cube_side,3),cube_count) as: 'cube_volume'
For more information, see Wrangle Language.
The SOURCEROWNUMBER function returns the row number of the row as it appears in the original dataset. After some operations, such as unions, joins, and aggregations, this row information is no longer available.
In Release 5.0.1 and earlier, the results were confusing. When source row information was not available, the function was simply not available for use.
In Release 5.1 and later, the behavior of the SOURCEROWNUMBER function is more consistent:
For more information, see SOURCEROWNUMBER Function.
|ARRAYELEMENTAT Function||Returns element value of input array for the provided index value.|
|DOUBLEMETAPHONE Function||Returns primary and secondary phonetic spellings of an input string using the Double Metaphone algorithm.|
|DOUBLEMETAPHONEEQUALS Function||Returns |
|UNIQUE Function||Generates a new column containing an array of the unique values from a source column.|
In Release 5.0 and earlier, the RAND function produced the same set of random numbers within the browser, after browser refresh, and over subsequent runs of a job.
In Release 5.0.1 and later, the RAND function accepts an optional integer as a parameter. When this new seed value is inserted, the function generates deterministic, pseudo-random values.
This version matches the behavior of the old function.
NOTE: On all upgraded instances of the platform, references to the RAND function have been converted to use a default seed value, so that previous behavior is maintained in the upgraded version.
If no seed value is inserted as a parameter, the RAND function generates true random values within the browser, after browser refresh, and over subsequent job runs.
NOTE: Be aware that modifying your dataset based on the generated values of
For more information, see RAND Function.
Prior to Release 5.0, the following was a valid step:
derive value:colA + colB as:'colC'
Beginning in Release 5.0, the
type parameter is required. This parameter defines whether the transform is a single or multi-row formula. In the Transform Builder, this value must be specified.
The following is valid in Release 5.0:
derive type:single value:colA + colB as:'colC'
See Derive Transform.
See Transform Builder.
In Release 4.2.1 and earlier, the aggregate transform could be used to aggregate your datasets using aggregation functions and groupings.
In Release 5.0 and later, this transform has been merged into the pivot transform. The aggregate transform has been deprecated and is no longer available.
NOTE: During upgrade to Release 5.0 and later, recipes that had previously used the aggregate transform are automatically migrated to use the pivot equivalent.
Release 4.2.1 and earlier Aggregate:
Release 5.0 and later Pivot:
pivot value: AVERAGE(Score) limit: 1
limit parameter defines the maximum number of columns that can be generated by the pivot.
aggregate value:AVERAGE(Scores) group:studentId
pivot group: StudentId value: AVERAGE(Score) limit: 1
For more information, see Pivot Transform.
In the new Search panel, you can search for terms that can be used to select transformations for quick population of parameters. In the following table, you can see how terminology has changed in Release 5.0 for some common transforms from earlier release.
Tip: You can paste the Release 5.0 terms in the Search panel to locate the same transformations used in earlier releases.
|Release 4.2.1 and earlier transforms||Release 5.0 and later search terms|
Prior to Release 5.0, the following operator was used to test "not equal" comparisons:
Beginning in Release 5.0, the following operators is also supported:
derive value:IF ((col1 <> col2), 'different','equal') as:'testNotEqual'
Tip: Both of the above operators are supported, although the <> operator is preferred.
For more information, see Comparison Operators.
The ROUND function now supports rounding to a specified number of digits. By default, values are rounded to the nearest integer, as before. See ROUND Function.
|DEGREES Function||Generates the value in degrees for an input radians value.|
|EXACT Function||Compares two strings to see if they are exact matches.|
|FILTEROBJECT Function||Filters the keys and values from an Object based on specified keys.|
|HOST Function||Returns the host value from a URL.|
|ISEVEN Function||Returns |
|ISODD Function||Returns |
|KTHLARGESTUNIQUE Function||Computes the kth-ranked unique value in a set of values.|
|LCM Function||Returns the least common multiple between two input values.|
|MODE Function||Computes the mode (most common) value for a set of values.|
|MODEIF Function||Computes the mode based on a conditional test.|
|PAD Function||Pads the left or right side of a value with a specified character string.|
|PI Function||Generates the value for pi to 15 decimal places.|
|RADIANS Function||Generates the value in radians for an input degrees value.|
|RANDBETWEEN Function||Generates a random Integer in a range between two specified values.|
|RIGHTFIND Function||Locates a substring by searching from the right side of an input value.|
|ROLLINGCOUNTA Function||Computes count of non-null values across a rolling window within a column.|
|ROLLINGKTHLARGEST Function||Computes the kth largest value across a rolling window within a column.|
|ROLLINGKTHLARGESTUNIQUE Function||Computes the kth largest unique value across a rolling window within a column.|
|ROLLINGLIST Function||Computes list of all values across a rolling window within a column.|
|ROLLINGMAX Function||Computes maximum value across a rolling window within a column.|
|ROLLINGMIN Function||Computes minimum value across a rolling window within a column.|
|ROLLINGMODE Function||Computes mode (most common) value across a rolling window within a column.|
|ROLLINGSTDEV Function||Computes standard deviation across a rolling window within a column.|
|ROLLINGVAR Function||Computes variance across a rolling window within a column.|
|SIGN Function||Computes the positive or negative sign of an input value.|
|TRUNC Function||Truncates a value to the nearest integer or a specified number of digits.|
|URLPARAMS Function||Extracts any query parameters from a URL into an Object.|
|WEEKNUM Function||Calculates the week that the date appears during the year (1-52).|
Perform a variety of predefined row filtrations using the new
filter transform, or apply your own custom formula to keep or delete rows from your dataset.
Beginning in Release 4.2, you can use the Transform Builder to simplify the construction of
CASE statements. For each case, specify the conditional and resulting expression in separate textboxes.
rename transform to rename multiple columns in a single transform.
drop transform now supports the option of dropping all columns except the ones specified in the transform. See Drop Transform.
Compare two strings using Latin collation settings. See below.
In Release 4.1.1 and earlier, the NOW function returned time values for the specified time zone in 12-hour time, which was confusing.
In Release 4.2 and later, this function returns values in 24-hour time.