# Derive Transform

**Nota**

Transforms are a part of the underlying language, which is not directly accessible to users. This content is maintained for reference purposes only. For more information on the user-accessible equivalent to transforms, see Transformation Reference.

Generate a new column where the values are the output of the `value`

expression. Expression can be calculated based on values specified in the `group`

parameter. Output column can be named as needed.

## Basic Usage

**String literal example:**

derive type: single value: 'passed' as:'status'

**Output: **Generates a new column called `status`

, each row of which contains `passed`

for its value.

**Column reference example:**

derive type: single value:productName as:'orig_productName'

**Output: **Generates a new column called `orig_productName`

, which contains all of the values in `productName`

, effectively serving as a backup of the source column.

**Function reference example:**

derive type: single value:SQRT(POW(a,2) + POW(b,2)) as:'c'

**Output:** Generates a new column called `c`

, which is the calculation of the Pythagorean theorem for the values stored in `a`

and `b`

. For more information on this example, see POW Function.

**Window function example:**

You can use window functions in your `derive`

transforms:

derive type: multiple col: avgRolling value: ROLLINGAVERAGE(POS_Sales, 7, 0) group: saleDate order: saleDate

**Output: **Calculate the value in the column of `avgRolling`

to be the rolling average of the `POS_Sales`

values for the preceding seven days, grouped and ordered by the `saleDate`

column. For more information, see Window Functions.

## Syntax and Parameters

derive type: single|multiple value:(expression) [order: order_col] [group: group_col] [as:'new_column_name']

Token | Required? | Data Type | Description |
---|---|---|---|

derive | Y | transform | Name of the transform |

type | Y | string | Type of formula: |

value | Y | string | Expression that generates the value to store in the new column |

order | N | string | Column or column names by which to sort the dataset before the |

group | N | string | If you are using aggregate or window functions, you can specify a |

as | N | string | Name of the newly generated column |

For more information on syntax standards, see Language Documentation Syntax Notes.

### type

Type of formula in the transformation:

Value | Description |
---|---|

single | Formula calculations are contained within a single row of values. |

multiple | Formula calculations involve multiple rows of inputs. |

** Usage Notes:**

Required? | Data Type |
---|---|

Yes | String ( |

### value

Identifies the expression that is applied by the transform. The `value`

parameter can be one of the following types:

test predicates that evaluate to Boolean values (

`value: myAge == '30'`

yields a`true`

or`false`

value), orcomputational expressions (

`value: abs(pow(myCol,3))`

).

The expected type of `value`

expression is determined by the transform type. Each type of expression can contain combinations of the following:

**literal values:**`value: 'Hello, world'`

**column references:**`value: amountOwed * 10`

**functions:**`value: left(myString, 4)`

**combinations:**`value: abs(pow(myCol,3))`

The types of any generated values are re-inferred by the platform.

** Usage Notes:**

Required? | Data Type |
---|---|

Yes | String (literal, column reference, function call, or combination) |

### order

This parameter specifies the column on which to sort the dataset before applying the specified function. For combination sort keys, you can add multiple comma-separated columns.

**Nota**

The order parameter must unambiguously specify an ordering for the data, or the generated results may vary between job executions.

**Nota**

If it is present, the dataset is first grouped by the `group`

value before it is ordered by the values in the `order`

column.

**Nota**

The `order`

column does not need to be sorted before the transform is executed on it.

**Suggerimento**

To sort in reverse order, prepend the column name with a dash (`-MyDate`

).

** Usage Notes:**

Required? | Data Type |
---|---|

No | String (column name) |

### group

Identifies the column by which the dataset is grouped for purposes of applying the transform.

**Nota**

Transforms that use the `group`

parameter can result in non-deterministic re-ordering in the data grid. However, you should apply the `group`

parameter, particularly on larger datasets, or your job may run out of memory and fail. To enforce row ordering, you can use the `sort`

transform. For more information, see Sort Transform.

The `ProdId`

column contains three values: `P001`

, `P002`

, and `P003`

, and you add the following transformation:

derive type: single value:SUM(Sales) group:ProdId as:'SalesByProd'

The above transform generates the `SalesByProd`

column, which contains the sum of the `Sales`

values, as grouped according to the three product identifiers.

If the value parameter contains aggregate or window functions, you can apply the `group`

parameter to specify subsets of records across which the value computation is applied.

**Nota**

Transforms that use the `group`

parameter can result in non-deterministic re-ordering in the data grid. However, you should apply the `group`

parameter, particularly on larger datasets, or your job may run out of memory and fail. To enforce row ordering, you can use the `sort`

transform. For more information, see Sort Transform.

You can specify one or more columns by which to group using comma-separated column references.

** Usage Notes:**

Required? | Data Type |
---|---|

No | String (column name) |

### as

Name of the new column that is being generated. If the `as`

parameter is not specified, a default name is used.

derive type: single value:(colX * colY) as:'areaXY'

**Output:** Generates a new column containing the product of the values in columns `colX`

and `colY`

. New column is explicitly named, `areaXY`

.

** Usage Notes:**

Required? | Data Type |
---|---|

No | String (column name) |

## Examples

**Suggerimento**

For additional examples, see Common Tasks.

### Example - Basic Derive Examples

The following dataset is used for performing some simple statistical analysis using the `derive`

transform.

**Source:**

StudentId | TestNumber | TestScore |
---|---|---|

S001 | 1 | 78 |

S001 | 2 | 85 |

S001 | 3 | 81 |

S002 | 1 | 84 |

S002 | 2 | 92 |

S002 | 3 | 77 |

S003 | 1 | 83 |

S003 | 2 | 88 |

S003 | 3 | 85 |

**Transformation:**

First, you can calculate the total average score across all tests:

Transformation Name | |
---|---|

Parameter: Formula type | Single row formula |

Parameter: Formula | average(Score) |

Parameter: New column name | 'avgScore' |

In their unformatted form, the output values are lengthy. You can edit the above transform to nest the `value`

statement with proper formatting using the `numformat`

function:

Transformation Name | |
---|---|

Parameter: Formula type | Single row formula |

Parameter: Formula | numformat(average(Score),'##.00') |

Parameter: New column name | 'avgScore' |

You might also be interested to know how individual students fared and to identify which tests caused the greatest challenges for the students:

Transformation Name | |
---|---|

Parameter: Formula type | Single row formula |

Parameter: Formula | numformat(average(Score),'##.00') |

Parameter: Group rows by | StudentId |

Parameter: New column name | 'avgScorebyStudentId' |

Transformation Name | |
---|---|

Parameter: Formula type | Single row formula |

Parameter: Formula | numformat(average(Score),'##.00') |

Parameter: Group rows by | TestNumber |

Parameter: New column name | 'avgScoreByTest' |

To calculate total scores for each student, add the following. Since each individual test score is a whole number, no rounding formatting is required.

Transformation Name | |
---|---|

Parameter: Formula type | Single row formula |

Parameter: Formula | sum(Score) |

Parameter: Group rows by | StudentId |

Parameter: New column name | 'totalScoreByStudentId' |

**Results:**

StudentId | TestNumber | TestScore | avgScore | avgScorebyStudentId | ScoreByTest | totalScoreByStudentId |
---|---|---|---|---|---|---|

S001 | 1 | 78 | 83.67 | 81.33 | 81.67 | 244 |

S001 | 2 | 85 | 83.67 | 81.33 | 88.33 | 244 |

S001 | 3 | 81 | 83.67 | 81.33 | 81.00 | 244 |

S002 | 1 | 84 | 83.67 | 84.33 | 81.67 | 253 |

S002 | 2 | 92 | 83.67 | 84.33 | 88.33 | 253 |

S002 | 3 | 77 | 83.67 | 84.33 | 81.00 | 253 |

S003 | 1 | 83 | 83.67 | 85.33 | 81.67 | 256 |

S003 | 2 | 88 | 83.67 | 85.33 | 88.33 | 256 |

S003 | 3 | 85 | 83.67 | 85.33 | 81.00 | 256 |

### Example - Rounding Functions

This example demonstrates how the rounding functions work together.

**Functions**:

Item | Description |
---|---|

FLOOR Function | Computes the largest integer that is not more than the input value. Input can be an Integer, a Decimal, a column reference, or an expression. |

CEILING Function | Computes the |

ROUND Function | Rounds input value to the nearest integer. Input can be an Integer, a Decimal, a column reference, or an expression. Optional second argument can be used to specify the number of digits to which to round. |

MOD Function | Returns the modulo value, which is the remainder of dividing the first argument by the second argument. Equivalent to the |

**Source:**

rowNum | X |
---|---|

1 | -2.5 |

2 | -1.2 |

3 | 0 |

4 | 1 |

5 | 1.5 |

6 | 2.5 |

7 | 3.9 |

8 | 4 |

9 | 4.1 |

10 | 11 |

**Transformation:**

Transformation Name | |
---|---|

Parameter: Formula type | Single row formula |

Parameter: Formula | FLOOR(X) |

Parameter: New column name | 'floorX' |

Transformation Name | |
---|---|

Parameter: Formula type | Single row formula |

Parameter: Formula | CEILING(X) |

Parameter: New column name | 'ceilingX' |

Transformation Name | |
---|---|

Parameter: Formula type | Single row formula |

Parameter: Formula | ROUND (X) |

Parameter: New column name | 'roundX' |

Transformation Name | |
---|---|

Parameter: Formula type | Single row formula |

Parameter: Formula | (X % 2) |

Parameter: New column name | 'modX' |

**Results:**

rowNum | X | modX | roundX | ceilingX | floorX |
---|---|---|---|---|---|

1 | -2.5 | -2 | -2 | -3 | |

2 | -1.2 | -1 | -1 | -2 | |

3 | 0 | 0 | 0 | 0 | 0 |

4 | 1 | 1 | 1 | 1 | 1 |

5 | 1.5 | 2 | 2 | 1 | |

6 | 2.5 | 3 | 3 | 2 | |

7 | 3.9 | 4 | 4 | 3 | |

8 | 4 | 0 | 4 | 4 | 4 |

9 | 4.1 | 4 | 5 | 4 | |

10 | 11 | 1 | 11 | 11 | 11 |