Page tree

 

Support | BlogContact Us | 844.332.2821

 

Contents:

This documentation applies to Trifacta Wrangler. Download this free product.
Registered users of this product or Trifacta Wrangler Enterprise should login to Product Docs through the application.

Contents:


Generate a new column where the values are the output of the value expression. Expression can be calculated based on values specified in the group parameter. Output column can be named as needed.  

Basic Usage

String literal example:

derive value: 'passed' as:'status'

Output: Generates a new column called status, each row of which contains passed for its value. 

Column reference example:

derive value:productName as:'orig_productName'

Output: Generates a new column called orig_productName, which contains all of the values in productName, effectively serving as a backup of the source column.

Function reference example:

derive value:SQRT(POW(a,2) + POW(b,2)) as:'c'

Output: Generates a new column called c, which is the calculation of the Pythagorean theorem for the values stored in a and b. For more information on this example, see POW Function.

Window function example:

You can use window functions in your derive transforms:

derive col: avgRolling value: ROLLINGAVERAGE(POS_Sales, 7, 0) group: saleDate order: saleDate

Output: Calculate the value in the column of avgRolling to be the rolling average of the POS_Sales values for the preceding seven days, grouped and ordered by the saleDate column. For more information, see Window Functions.

Parameters

derive value:(expression) [order: order_col] [group: group_col] [as:'new_column_name']

ParameterRequired?Transform BuilderData TypeDescription
valueYFormulastringExpression that generates the value to store in the new column
orderNOrder bystringColumn or column names by which to sort the dataset before the value expression is applied
groupNGroup bystringIf you are using aggregate or window functions, you can specify a group expression to identify the subset of records to apply the value expression.
asNNew column namestringName of the newly generated column

For more information on syntax standards, see Language Documentation Syntax Notes.

value

Identifies the expression that is applied by the transform. The value parameter can be one of the following types:

  • test predicates that evaluate to Boolean values (value: myAge == '30' yields a true or false value), or 
  • computational expressions ( value: abs(pow(myCol,3)) ).

The expected type of value expression is determined by the transform type. Each type of expression can contain combinations of the following:

  • literal values: value: 'Hello, world'
  • column references: value: amountOwed * 10
  • functions:  value: left(myString, 4)

  • combinations: value: abs(pow(myCol,3))

The types of any generated values are re-inferred by the platform. 

Usage Notes:

Required?Data Type
YesString (literal, column reference, function call, or combination)

order

This parameter specifies the column on which to sort the dataset before applying the specified function. For combination sort keys, you can add multiple comma-separated columns.

NOTE: If it is present, the dataset is first grouped by the group value before it is ordered by the values in the order column.

NOTE: The order column does not need to be sorted before the transform is executed on it.

Tip: To sort in reverse order, prepend the column name with a dash (-MyDate).

Usage Notes:

Required?Data Type
NoString (column name)

group

Identifies the column by which the dataset is grouped for purposes of applying the transform.

The ProdId column contains three values: P001P002, and P003, and you add the following transform:

derive value:SUM(Sales) group:ProdId as:'SalesByProd'

The above transform generates the SalesByProd column, which contains the sum of the Sales values, as grouped according to the three product identifiers.

If the value parameter contains aggregate or window functions, you can apply the group parameter to specify subsets of records across which the value computation is applied. 

You can specify one or more columns by which to group using comma-separated column references.

Usage Notes:

Required?Data Type
NoString (column name)

as

Name of the new column that is being generated. If the as parameter is not specified, a default name is used.

derive value:(colX * colY) as:'areaXY'

Output: Generates a new column containing the product of the values in columns colX and colY. New column is explicitly named, areaXY.

Usage Notes:

Required?Data Type
NoString (column name)

Examples

Example - Basic Derive Examples

The following dataset is used for performing some simple statistical analysis using the derive transform.

Source:

StudentIdTestNumberTestScore
S001178
S001285
S001381
S002184
S002292
S002377
S003183
S003288
S003385

Transform:

First, you can calculate the total average score across all tests:

derive value:AVERAGE(Score) as:'avgScore'

In their unformatted form, the output values are lengthy. You can edit the above transform to nest the value statement with proper formatting using the NUMFORMAT function:

derive value:NUMFORMAT(AVERAGE(Score),'##.00') as:'avgScore'

You might also be interested to know how individual students fared and to identify which tests caused the greatest challenges for the students:

derive value: NUMFORMAT(AVERAGE(Score),'##.00') group: StudentId as: 'avgScorebyStudentId'

derive value:NUMFORMAT(AVERAGE(Score),'##.00') group:TestNumber as:'avgScoreByTest'

To calculate total scores for each student, add the following. Since each individual test score is a whole number, no rounding formatting is required.

derive value:SUM(Score) group:StudentId as:'totalScoreByStudentId'

Results:

StudentIdTestNumberTestScoreavgScoreavgScorebyStudentIdScoreByTesttotalScoreByStudentId
S00117883.6781.3381.67244
S00128583.6781.3388.33244
S00138183.6781.3381.00244
S00218483.6784.3381.67253
S00229283.6784.3388.33253
S00237783.6784.3381.00253
S00318383.6785.3381.67256
S00328883.6785.3388.33256
S00338583.6785.3381.00256

 

Example - Rounding Functions

The following example demonstrates how the rounding functions work together. These functions include the following:

  • FLOOR - largest integer that is not greater than the input value. See FLOOR Function.
  • CEILING - smallest integer that is not less than the input value. See CEILING Function.
  • ROUND - nearest integer to the input value. See ROUND Function.
  • MOD - remainder integer when input1 is divided by input2. See Numeric Operators

Source:

rowNumX
1-2.5
2-1.2
30
41
51.5
62.5
73.9
84
94.1
1011

Transform:

derive value: FLOOR (X) as: 'floorX'

derive value: CEILING (X) as: 'ceilingX'

derive value: ROUND (X) as: 'roundX'

derive value: (X % 2) as: 'modX'


Results:

rowNumXmodXroundXceilingXfloorX
1-2.5 -2-2-3
2-1.2 -1-1-2
300000
411111
51.5 221
62.5 332
73.9 443
840444
94.1 454
10111111111

Example - Other Examples

In the language documentation, there are many other examples where the derive transform is used. 

Your Rating: Results: PatheticBadOKGoodOutstanding! 25 rates

This page has no comments.