**Contents:**

**NOTE:** Transforms are a part of the underlying language, which is not directly accessible to users. This content is maintained for reference purposes only. For more information on the user-accessible equivalent to transforms, see Transformation Reference.

`value`

expression. Expression can be calculated based on values specified in the `group`

parameter. Output column can be named as needed.

## Basic Usage

**String literal example:**

derive type: single value: 'passed' as:'status'

**Output: **Generates a new column called `status`

, each row of which contains `passed`

for its value.

**Column reference example:**

derive type: single value:productName as:'orig_productName'

**Output: **Generates a new column called `orig_productName`

, which contains all of the values in `productName`

, effectively serving as a backup of the source column.

**Function reference example:**

derive type: single value:SQRT(POW(a,2) + POW(b,2)) as:'c'

**Output: **Generates a new column called `c`

, which is the calculation of the Pythagorean theorem for the values stored in `a`

and `b`

. For more information on this example, see POW Function.

**Window function example:**

You can use window functions in your `derive`

transforms:

derive type: multiple col: avgRolling value: ROLLINGAVERAGE(POS_Sales, 7, 0) group: saleDate order: saleDate

**Output: **Calculate the value in the column of `avgRolling`

to be the rolling average of the `POS_Sales`

values for the preceding seven days, grouped and ordered by the `saleDate`

column. For more information, see Window Functions.

## Syntax and Parameters

derive type: single|multiple value:(expression) [order: order_col] [group: group_col] [as:'new_column_name']

Token | Required? | Data Type | Description |
---|---|---|---|

derive | Y | transform | Name of the transform |

type | Y | string | Type of formula: `single` (single row) or `multiple` (multi-row) |

value | Y | string | Expression that generates the value to store in the new column |

order | N | string | Column or column names by which to sort the dataset before the `value` expression is applied |

group | N | string | If you are using aggregate or window functions, you can specify a `group` expression to identify the subset of records to apply the `value` expression. |

as | N | string | Name of the newly generated column |

For more information on syntax standards, see Language Documentation Syntax Notes.

### type

Type of formula in the transform:

Value | Description |
---|---|

`single` | Formula calculations are contained within a single row of values. |

`multiple` | Formula calculations involve multiple rows of inputs. |

**
Usage Notes:
**

Required? | Data Type |
---|---|

Yes | String (`single` or `multiple` ) |

### value

`value`

parameter can be one of the following types:- test predicates that evaluate to Boolean values (
`value: myAge == '30'`

yields a`true`

or`false`

value), or - computational expressions (
`value: abs(pow(myCol,3))`

).

The expected type of `value`

expression is determined by the transform type. Each type of expression can contain combinations of the following:

**literal values:**`value: 'Hello, world'`

**column references:**`value: amountOwed * 10`

**functions:**`value: left(myString, 4)`

**combinations:**`value: abs(pow(myCol,3))`

The types of any generated values are re-inferred by the platform.

**
Usage Notes:
**

Required? | Data Type |
---|---|

Yes | String (literal, column reference, function call, or combination) |

### order

This parameter specifies the column on which to sort the dataset before applying the specified function. For combination sort keys, you can add multiple comma-separated columns.

**NOTE: **The order parameter must unambiguously specify an ordering for the data, or the generated results may vary between job executions.

**NOTE: **If it is present, the dataset is first grouped by the `group`

value before it is ordered by the values in the `order`

column.

**NOTE:** The `order`

column does not need to be sorted before the transform is executed on it.

**Tip: **To sort in reverse order, prepend the column name with a dash (`-MyDate`

).

**
Usage Notes:
**

Required? | Data Type |
---|---|

No | String (column name) |

### group

Identifies the column by which the dataset is grouped for purposes of applying the transform.

The `ProdId`

column contains three values: `P001`

, `P002`

, and `P003`

, and you add the following transform:

derive type: single value:SUM(Sales) group:ProdId as:'SalesByProd'

The above transform generates the `SalesByProd`

column, which contains the sum of the `Sales`

values, as grouped according to the three product identifiers.

If the value parameter contains aggregate or window functions, you can apply the `group`

parameter to specify subsets of records across which the value computation is applied.

**NOTE:**Transforms that use the

`group`

parameter can result in non-deterministic re-ordering in the data grid. However, you should apply the `group`

parameter, particularly on larger datasets, or your job may run out of memory and fail.
To enforce row ordering, you can use the `sort`

transform. For more information, see Sort Transform.
You can specify one or more columns by which to group using comma-separated column references.

**
Usage Notes:
**

Required? | Data Type |
---|---|

No | String (column name) |

### as

`as`

parameter is not specified, a default name is used.derive type: single value:(colX * colY) as:'areaXY'

**Output:** Generates a new column containing the product of the values in columns `colX`

and `colY`

. New column is explicitly named, `areaXY`

.

**
Usage Notes:
**

Required? | Data Type |
---|---|

No | String (column name) |

## Examples

**Tip:** For additional examples, see Common Tasks.

### Example - Basic Derive Examples

The following dataset is used for performing some simple statistical analysis using the `derive`

transform.

**Source:**

StudentId | TestNumber | TestScore |
---|---|---|

S001 | 1 | 78 |

S001 | 2 | 85 |

S001 | 3 | 81 |

S002 | 1 | 84 |

S002 | 2 | 92 |

S002 | 3 | 77 |

S003 | 1 | 83 |

S003 | 2 | 88 |

S003 | 3 | 85 |

**Transform:**

First, you can calculate the total average score across all tests:

derive type: single value:AVERAGE(Score) as:'avgScore'

In their unformatted form, the output values are lengthy. You can edit the above transform to nest the `value`

statement with proper formatting using the `NUMFORMAT`

function:

derive type: single value:NUMFORMAT(AVERAGE(Score),'##.00') as:'avgScore'

You might also be interested to know how individual students fared and to identify which tests caused the greatest challenges for the students:

derive type: single value: NUMFORMAT(AVERAGE(Score),'##.00') group: StudentId as: 'avgScorebyStudentId'

derive type: single value:NUMFORMAT(AVERAGE(Score),'##.00') group:TestNumber as:'avgScoreByTest'

To calculate total scores for each student, add the following. Since each individual test score is a whole number, no rounding formatting is required.

derive type: single value:SUM(Score) group:StudentId as:'totalScoreByStudentId'

**Results:**

StudentId | TestNumber | TestScore | avgScore | avgScorebyStudentId | ScoreByTest | totalScoreByStudentId |
---|---|---|---|---|---|---|

S001 | 1 | 78 | 83.67 | 81.33 | 81.67 | 244 |

S001 | 2 | 85 | 83.67 | 81.33 | 88.33 | 244 |

S001 | 3 | 81 | 83.67 | 81.33 | 81.00 | 244 |

S002 | 1 | 84 | 83.67 | 84.33 | 81.67 | 253 |

S002 | 2 | 92 | 83.67 | 84.33 | 88.33 | 253 |

S002 | 3 | 77 | 83.67 | 84.33 | 81.00 | 253 |

S003 | 1 | 83 | 83.67 | 85.33 | 81.67 | 256 |

S003 | 2 | 88 | 83.67 | 85.33 | 88.33 | 256 |

S003 | 3 | 85 | 83.67 | 85.33 | 81.00 | 256 |

### Example - Rounding Functions

`FLOOR`

- largest integer that is not greater than the input value. See FLOOR Function.`CEILING`

- smallest integer that is not less than the input value. See CEILING Function.`ROUND`

- nearest integer to the input value. See ROUND Function.`MOD`

- remainder integer when input1 is divided by input2. See Numeric Operators.

**Source:**

rowNum | X |
---|---|

1 | -2.5 |

2 | -1.2 |

3 | 0 |

4 | 1 |

5 | 1.5 |

6 | 2.5 |

7 | 3.9 |

8 | 4 |

9 | 4.1 |

10 | 11 |

**Transform:**

derive type: single value: FLOOR (X) as: 'floorX'

derive type: single value: CEILING (X) as: 'ceilingX'

derive type: single value: ROUND (X) as: 'roundX'

derive type: single value: (X % 2) as: 'modX'

**Results:**

rowNum | X | modX | roundX | ceilingX | floorX |
---|---|---|---|---|---|

1 | -2.5 | -2 | -2 | -3 | |

2 | -1.2 | -1 | -1 | -2 | |

3 | 0 | 0 | 0 | 0 | 0 |

4 | 1 | 1 | 1 | 1 | 1 |

5 | 1.5 | 2 | 2 | 1 | |

6 | 2.5 | 3 | 3 | 2 | |

7 | 3.9 | 4 | 4 | 3 | |

8 | 4 | 0 | 4 | 4 | 4 |

9 | 4.1 | 4 | 5 | 4 | |

10 | 11 | 1 | 11 | 11 | 11 |

### Example - Other Examples

In the language documentation, there are many other examples where the `derive`

transform is used.

This page has no comments.