Generate a new column where the values are the output of the value expression. Expression can be calculated based on values specified in the group parameter. Output column can be named as needed.  

String literal example:

derive type: single value: 'passed' as:'status'

Output: Generates a new column called status, each row of which contains passed for its value. 

Column reference example:

derive type: single value:productName as:'orig_productName'

Output: Generates a new column called orig_productName, which contains all of the values in productName, effectively serving as a backup of the source column.

Function reference example:

derive type: single value:SQRT(POW(a,2) + POW(b,2)) as:'c'

Output: Generates a new column called c, which is the calculation of the Pythagorean theorem for the values stored in a and b. For more information on this example, see POW Function.

Window function example:

You can use window functions in your derive transforms:

derive type: multiple col: avgRolling value: ROLLINGAVERAGE(POS_Sales, 7, 0) group: saleDate order: saleDate

Output: Calculate the value in the column of avgRolling to be the rolling average of the POS_Sales values for the preceding seven days, grouped and ordered by the saleDate column. For more information, see Window Functions.

derive type: single|multiple value:(expression) [order: order_col] [group: group_col] [as:'new_column_name']

TokenRequired?Data TypeDescription
deriveYtransformName of the transform
typeYstringType of formula: single (single row) or multiple (multi-row)
valueYstringExpression that generates the value to store in the new column
orderNstringColumn or column names by which to sort the dataset before the value expression is applied
groupNstringIf you are using aggregate or window functions, you can specify a group expression to identify the subset of records to apply the value expression.
asNstringName of the newly generated column


Type of formula in the transform:

singleFormula calculations are contained within a single row of values.
multipleFormula calculations involve multiple rows of inputs.

Required?Data Type
YesString (single or multiple)


Required?Data Type
YesString (literal, column reference, function call, or combination)


This parameter specifies the column on which to sort the dataset before applying the specified function. For combination sort keys, you can add multiple comma-separated columns.

NOTE: The order parameter must unambiguously specify an ordering for the data, or the generated results may vary between job executions.

NOTE: If it is present, the dataset is first grouped by the group value before it is ordered by the values in the order column.

NOTE: The order column does not need to be sorted before the transform is executed on it.

Tip: To sort in reverse order, prepend the column name with a dash (-MyDate).

Required?Data Type
NoString (column name)


The ProdId column contains three values: P001P002, and P003, and you add the following transform:

derive type: single value:SUM(Sales) group:ProdId as:'SalesByProd'

The above transform generates the SalesByProd column, which contains the sum of the Sales values, as grouped according to the three product identifiers.

If the value parameter contains aggregate or window functions, you can apply the group parameter to specify subsets of records across which the value computation is applied. 

You can specify one or more columns by which to group using comma-separated column references.

Required?Data Type
NoString (column name)


derive type: single value:(colX * colY) as:'areaXY'

Output: Generates a new column containing the product of the values in columns colX and colY. New column is explicitly named, areaXY.

Required?Data Type
NoString (column name)

Example - Basic Derive Examples

The following dataset is used for performing some simple statistical analysis using the derive transform.




First, you can calculate the total average score across all tests:

derive type: single value:AVERAGE(Score) as:'avgScore'

In their unformatted form, the output values are lengthy. You can edit the above transform to nest the value statement with proper formatting using the NUMFORMAT function:

derive type: single value:NUMFORMAT(AVERAGE(Score),'##.00') as:'avgScore'

You might also be interested to know how individual students fared and to identify which tests caused the greatest challenges for the students:

derive type: single value: NUMFORMAT(AVERAGE(Score),'##.00') group: StudentId as: 'avgScorebyStudentId'

derive type: single value:NUMFORMAT(AVERAGE(Score),'##.00') group:TestNumber as:'avgScoreByTest'

To calculate total scores for each student, add the following. Since each individual test score is a whole number, no rounding formatting is required.

derive type: single value:SUM(Score) group:StudentId as:'totalScoreByStudentId'




Example - Rounding Functions

Example - Other Examples

In the language documentation, there are many other examples where the derive transform is used.