Key

• This line was removed.
• Formatting was changed.

D toc

D s transforms

Excerpt

Generate a new column where the values are the output of the `value` expression. Expression can be calculated based on values specified in the `group` parameter. Output column can be named as needed.

D s
snippet Basic

String literal example:

D code
derive type: single value: 'passed' as:'status'

Output: Generates a new column called `status`, each row of which contains `passed` for its value.

Column reference example:

D code

derive type: single value:productName as:'orig_productName'

Output: Generates a new column called `orig_productName`, which contains all of the values in `productName`, effectively serving as a backup of the source column.

Function reference example:

D code

derive type: single value:SQRT(POW(a,2) + POW(b,2)) as:'c'

Output: Generates a new column called `c`, which is the calculation of the Pythagorean theorem for the values stored in `a` and `b`. For more information on this example, see POW Function.

Window function example:

You can use window functions in your `derive` transforms:

D code

derive type: multiple col: avgRolling value: ROLLINGAVERAGE(POS_Sales, 7, 0) group: saleDate order: saleDate

Output: Calculate the value in the column of `avgRolling` to be the rolling average of the `POS_Sales` values for the preceding seven days, grouped and ordered by the `saleDate` column. For more information, see Window Functions.

D s
snippet Parameters

D code

derive type: single|multiple value:(expression) [order: order_col] [group: group_col] [as:'new_column_name']

TokenRequired?Data TypeDescription
deriveYtransformName of the transform
typeYstringType of formula: `single` (single row) or `multiple` (multi-row)
valueYstringExpression that generates the value to store in the new column
orderNstringColumn or column names by which to sort the dataset before the `value` expression is applied
groupNstringIf you are using aggregate or window functions, you can specify a `group` expression to identify the subset of records to apply the `value` expression.
asNstringName of the newly generated column

D s lang notes

type

Type of formula in the transformation:

ValueDescription
`single`Formula calculations are contained within a single row of values.
`multiple`Formula calculations involve multiple rows of inputs.

D s
snippet usage

Required?Data Type
YesString (`single` or `multiple`)

value

Include Page
 value Parameter value Parameter

D s
snippet usage

Required?Data Type
YesString (literal, column reference, function call, or combination)

order

This parameter specifies the column on which to sort the dataset before applying the specified function. For combination sort keys, you can add multiple comma-separated columns.

Info

NOTE: The order parameter must unambiguously specify an ordering for the data, or the generated results may vary between job executions.

Info

NOTE: If it is present, the dataset is first grouped by the `group` value before it is ordered by the values in the `order` column.

Info

NOTE: The `order` column does not need to be sorted before the transform is executed on it.

Tip

Tip: To sort in reverse order, prepend the column name with a dash (`-MyDate`).

D s
snippet usage

Required?Data Type
NoString (column name)

group

Include Page
 group Parameter group Parameter

The `ProdId` column contains three values: `P001``P002`, and `P003`, and you add the following transformation:

D code

derive type: single value:SUM(Sales) group:ProdId as:'SalesByProd'

The above transform generates the `SalesByProd` column, which contains the sum of the `Sales` values, as grouped according to the three product identifiers.

If the value parameter contains aggregate or window functions, you can apply the `group` parameter to specify subsets of records across which the value computation is applied.

D s ordering

You can specify one or more columns by which to group using comma-separated column references.

D s
snippet usage

Required?Data Type
NoString (column name)

as

Include Page
 as Parameter as Parameter

D code

derive type: single value:(colX * colY) as:'areaXY'

Output: Generates a new column containing the product of the values in columns `colX` and `colY`. New column is explicitly named, `areaXY`.

D s
snippet usage

Required?Data Type
NoString (column name)

D s
snippet Examples

Example - Basic Derive Examples

The following dataset is used for performing some simple statistical analysis using the `derive` transform.

Source:

StudentIdTestNumberTestScore
S001178
S001285
S001381
S002184
S002292
S002377
S003183
S003288
S003385

Transformation:

First, you can calculate the total average score across all tests:

D trans
RawWrangle true 'avgScore' step derive type: single value:average(Score) as:'avgScore' Formula type Single row formula Formula average(Score) New column name New formula

In their unformatted form, the output values are lengthy. You can edit the above transform to nest the `value` statement with proper formatting using the `numformat` function:

D trans
RawWrangle true 'avgScore' step derive type: single value:numformat(average(Score),'##.00') as:'avgScore' Formula type Single row formula Formula numformat(average(Score),'##.00') New column name New formula

You might also be interested to know how individual students fared and to identify which tests caused the greatest challenges for the students:

D trans
RawWrangle true StudentId step derive type: single value: numformat(average(Score),'##.00') group: StudentId as: 'avgScorebyStudentId' Formula type Single row formula Formula numformat(average(Score),'##.00') Group rows by 'avgScorebyStudentId' New column name New formula

D trans
RawWrangle true TestNumber step derive type: single value:numformat(average(Score),'##.00') group:TestNumber as:'avgScoreByTest' Formula type Single row formula Formula numformat(average(Score),'##.00') Group rows by 'avgScoreByTest' New column name New formula

To calculate total scores for each student, add the following. Since each individual test score is a whole number, no rounding formatting is required.

D trans
RawWrangle true StudentId step derive type: single value:sum(Score) group:StudentId as:'totalScoreByStudentId' Formula type Single row formula Formula sum(Score) Group rows by 'totalScoreByStudentId' New column name New formula

Results:

StudentIdTestNumberTestScoreavgScoreavgScorebyStudentIdScoreByTesttotalScoreByStudentId
S00117883.6781.3381.67244
S00128583.6781.3388.33244
S00138183.6781.3381.00244
S00218483.6784.3381.67253
S00229283.6784.3388.33253
S00237783.6784.3381.00253
S00318383.6785.3381.67256
S00328883.6785.3388.33256
S00338583.6785.3381.00256

Example - Rounding Functions

Include Page
 EXAMPLE - Rounding Functions EXAMPLE - Rounding Functions

D s also
label wrangle_transform_derive