Contents:
The cloud-based version of Trifacta Wrangler is now available! Read all about it, and register for your free account.
Contents:
NOTE: Transforms are a part of the underlying language that is not directly accessible to users. This content is maintained for reference purposes only.
value
expression. Expression can be calculated based on values specified in the group
parameter. Output column can be named as needed.
Basic Usage
String literal example:
derive type: single value: 'passed' as:'status'
Output: Generates a new column called status
, each row of which contains passed
for its value.
Column reference example:
derive type: single value:productName as:'orig_productName'
Output: Generates a new column called orig_productName
, which contains all of the values in productName
, effectively serving as a backup of the source column.
Function reference example:
derive type: single value:SQRT(POW(a,2) + POW(b,2)) as:'c'
Output: Generates a new column called c
, which is the calculation of the Pythagorean theorem for the values stored in a
and b
. For more information on this example, see POW Function.
Window function example:
You can use window functions in your derive
transforms:
derive type: multiple col: avgRolling value: ROLLINGAVERAGE(POS_Sales, 7, 0) group: saleDate order: saleDate
Output: Calculate the value in the column of avgRolling
to be the rolling average of the POS_Sales
values for the preceding seven days, grouped and ordered by the saleDate
column. For more information, see Window Functions.
Parameters
derive type: single|multiple value:(expression) [order: order_col] [group: group_col] [as:'new_column_name']
Token | Required? | Data Type | Description |
---|---|---|---|
derive | Y | transform | Name of the transform |
type | Y | string | Type of formula: single (single row) or multiple (multi-row) |
value | Y | string | Expression that generates the value to store in the new column |
order | N | string | Column or column names by which to sort the dataset before the value expression is applied |
group | N | string | If you are using aggregate or window functions, you can specify a group expression to identify the subset of records to apply the value expression. |
as | N | string | Name of the newly generated column |
For more information on syntax standards, see Language Documentation Syntax Notes.
type
Type of formula in the transform:
Value | Description |
---|---|
single | Formula calculations are contained within a single row of values. |
multiple | Formula calculations involve multiple rows of inputs. |
Usage Notes:
Required? | Data Type |
---|---|
Yes | String (single or multiple ) |
value
Identifies the expression that is applied by the transform. The value
parameter can be one of the following types:
- test predicates that evaluate to Boolean values (
value: myAge == '30'
yields atrue
orfalse
value), or - computational expressions (
value: abs(pow(myCol,3))
).
The expected type of value
expression is determined by the transform type. Each type of expression can contain combinations of the following:
- literal values:
value: 'Hello, world'
- column references:
value: amountOwed * 10
functions:
value: left(myString, 4)
combinations:
value: abs(pow(myCol,3))
The types of any generated values are re-inferred by the platform.
Usage Notes:
Required? | Data Type |
---|---|
Yes | String (literal, column reference, function call, or combination) |
order
This parameter specifies the column on which to sort the dataset before applying the specified function. For combination sort keys, you can add multiple comma-separated columns.
NOTE: The order parameter must unambiguously specify an ordering for the data, or the generated results may vary between job executions.
NOTE: If it is present, the dataset is first grouped by the group
value before it is ordered by the values in the order
column.
NOTE: The order
column does not need to be sorted before the transform is executed on it.
Tip: To sort in reverse order, prepend the column name with a dash (-MyDate
).
Usage Notes:
Required? | Data Type |
---|---|
No | String (column name) |
group
Identifies the column by which the dataset is grouped for purposes of applying the transform.
The ProdId
column contains three values: P001
, P002
, and P003
, and you add the following transform:
derive type: single value:SUM(Sales) group:ProdId as:'SalesByProd'
The above transform generates the SalesByProd
column, which contains the sum of the Sales
values, as grouped according to the three product identifiers.
If the value parameter contains aggregate or window functions, you can apply the group
parameter to specify subsets of records across which the value computation is applied.
group
parameter can result in non-deterministic re-ordering in the data grid.
However, if you're running your job on the Spark running environment, you should apply the group
parameter, or your job may run out of memory and fail. To avoid this issue and to enforce row ordering, use the sort
transform. For more information, see Sort Transform.
You can specify one or more columns by which to group using comma-separated column references.
Usage Notes:
Required? | Data Type |
---|---|
No | String (column name) |
as
Name of the new column that is being generated. If the as
parameter is not specified, a default name is used.
derive type: single value:(colX * colY) as:'areaXY'
Output: Generates a new column containing the product of the values in columns colX
and colY
. New column is explicitly named, areaXY
.
Usage Notes:
Required? | Data Type |
---|---|
No | String (column name) |
Examples
Example - Basic Derive Examples
The following dataset is used for performing some simple statistical analysis using the derive
transform.
Source:
StudentId | TestNumber | TestScore |
---|---|---|
S001 | 1 | 78 |
S001 | 2 | 85 |
S001 | 3 | 81 |
S002 | 1 | 84 |
S002 | 2 | 92 |
S002 | 3 | 77 |
S003 | 1 | 83 |
S003 | 2 | 88 |
S003 | 3 | 85 |
Transform:
First, you can calculate the total average score across all tests:
derive type: single value:AVERAGE(Score) as:'avgScore'
In their unformatted form, the output values are lengthy. You can edit the above transform to nest the value
statement with proper formatting using the NUMFORMAT
function:
derive type: single value:NUMFORMAT(AVERAGE(Score),'##.00') as:'avgScore'
You might also be interested to know how individual students fared and to identify which tests caused the greatest challenges for the students:
derive type: single value: NUMFORMAT(AVERAGE(Score),'##.00') group: StudentId as: 'avgScorebyStudentId'
derive type: single value:NUMFORMAT(AVERAGE(Score),'##.00') group:TestNumber as:'avgScoreByTest'
To calculate total scores for each student, add the following. Since each individual test score is a whole number, no rounding formatting is required.
derive type: single value:SUM(Score) group:StudentId as:'totalScoreByStudentId'
Results:
StudentId | TestNumber | TestScore | avgScore | avgScorebyStudentId | ScoreByTest | totalScoreByStudentId |
---|---|---|---|---|---|---|
S001 | 1 | 78 | 83.67 | 81.33 | 81.67 | 244 |
S001 | 2 | 85 | 83.67 | 81.33 | 88.33 | 244 |
S001 | 3 | 81 | 83.67 | 81.33 | 81.00 | 244 |
S002 | 1 | 84 | 83.67 | 84.33 | 81.67 | 253 |
S002 | 2 | 92 | 83.67 | 84.33 | 88.33 | 253 |
S002 | 3 | 77 | 83.67 | 84.33 | 81.00 | 253 |
S003 | 1 | 83 | 83.67 | 85.33 | 81.67 | 256 |
S003 | 2 | 88 | 83.67 | 85.33 | 88.33 | 256 |
S003 | 3 | 85 | 83.67 | 85.33 | 81.00 | 256 |
Example - Rounding Functions
The following example demonstrates how the rounding functions work together. These functions include the following:
FLOOR
- largest integer that is not greater than the input value. See FLOOR Function.CEILING
- smallest integer that is not less than the input value. See CEILING Function.ROUND
- nearest integer to the input value. See ROUND Function.MOD
- remainder integer when input1 is divided by input2. See Numeric Operators.
Source:
rowNum | X |
---|---|
1 | -2.5 |
2 | -1.2 |
3 | 0 |
4 | 1 |
5 | 1.5 |
6 | 2.5 |
7 | 3.9 |
8 | 4 |
9 | 4.1 |
10 | 11 |
Transform:
derive type: single value: FLOOR (X) as: 'floorX'
derive type: single value: CEILING (X) as: 'ceilingX'
derive type: single value: ROUND (X) as: 'roundX'
derive type: single value: (X % 2) as: 'modX'
Results:
rowNum | X | modX | roundX | ceilingX | floorX |
---|---|---|---|---|---|
1 | -2.5 | -2 | -2 | -3 | |
2 | -1.2 | -1 | -1 | -2 | |
3 | 0 | 0 | 0 | 0 | 0 |
4 | 1 | 1 | 1 | 1 | 1 |
5 | 1.5 | 2 | 2 | 1 | |
6 | 2.5 | 3 | 3 | 2 | |
7 | 3.9 | 4 | 4 | 3 | |
8 | 4 | 0 | 4 | 4 | 4 |
9 | 4.1 | 4 | 5 | 4 | |
10 | 11 | 1 | 11 | 11 | 11 |
Example - Other Examples
In the language documentation, there are many other examples where the derive
transform is used.
This page has no comments.