**Contents:**

=20

=20
=20
- =20
- Basic Usage =20
- Syntax and Parameters<= /a>=20 =20
- Examples=20 =20

=20

**NOTE:** Transforms are a part of the underlying language,=
which is not directly accessible to users. This content is maintained for =
reference purposes only. For more information on the user-accessible equiva=
lent to transforms, see =
Transformation Reference.

`value`

expression. Expression can be calculated based on v=
alues specified in the
`group`

parameter. Output column can be named as needed.&nb=
sp;

**String literal example:**

derive type: single value: 'passed' as:'status'

**Output: **Generates a new column called ```
s=
tatus
```

, each row of which contains `passed`

for its v=
alue.

**Column reference example:**

derive type: single value:productNam= e as:'orig_productName'

**Output: **Generates a new column called ```
o=
rig_productName
```

, which contains all of the values in ```
prod=
uctName
```

, effectively serving as a backup of the source column.

**Function reference example:**

derive type: single value:SQRT(POW(a= ,2) + POW(b,2)) as:'c'

**Output: **Generates a new column called ```
c, which is the calculation of the Pythagorean theorem for the values stor=
ed in
```

`a`

and `b`

. For more information on =
this example, see POW Function<=
/a>.

```
```**Window function example:**

You can use window functions in your `derive`

trans=
forms:

=20
derive type: multiple col: avgRollin=
g value: ROLLINGAVERAGE(POS_Sales, 7, 0) group: saleDate order: saleDate

**Output: **Calculate the value in the column of =
`avgRolling`

to be the rolling average of the ```
PO=
S_Sales
```

values for the preceding seven days, grouped and ordere=
d by the `saleDate`

column. For more information, see&=
nbsp;Window Functions.

## Syntax and Parameters

=20
derive type: single|multiple value:(=
expression) [order: order_col] [group: group_col] <=
/span>[as:'new_column_name']

=20
=20
=20
=20
=20
Token
Required?
Data Type
Description
derive
Y
transform
Name of the transform
type
Y
string
Type of formula: `single (single row) or ``multiple`

(multi-row)

value
Y
string
Expression that generates the value to store in =
the new column
order
N
string
Column or column names by which to=
sort the dataset before the `value`

expression is app=
lied
group
N
string
If you are using aggregate or wind=
ow functions, you can specify a `group`

expression to identify t=
he subset of records to apply the `value`

expression.
as
N
string
Name of the newly generated column=

For more information on syntax standards, see Language Documentation Syntax Note=
s.

### type

Type of formula in the transform:

Value
Description
`single`

Formula calculations are contained within a sing=
le row of values.
`multiple`

Formula calculations involve multi=
ple rows of inputs.

**Usage Notes:**

Required?
Data Type
Yes
String (`single`

or `multiple)`

### value

Identifies the expression that is applied by the transform. The ```
va=
lue
```

parameter can be one of the following types:

- test predicates that evaluate to Boolean values (
```
value: myAge =3D=
=3D '30'
```

yields a `true`

or `false`

=
value), or
- computational expressions (
`value: abs(pow(myCol,3))`

=
).

The expected type of `value`

expression is determined by the =
transform type. Each type of expression can contain combinations of th=
e following:

**literal values: ** `value: 'Hello, world'`

```
```-
**column references: ** ```
value: amountOwe=
d * 10
```

` `**functions: ** <=
code>value: left(myString, 4)

**combinations: ** ```
=
value: abs(pow(myCol,3))
```

The types of any generated values are re-inferred by the platform.

**Usage Notes:**

=20
=20
=20
Required?
Data Type
Yes
String (literal, column reference, functio=
n call, or combination)

### order

This parameter specifies the column on which to sort the dataset be=
fore applying the specified function. For combination sort keys, you can ad=
d multiple comma-separated columns.

**NOTE: **The order parameter must unambiguously specify an=
ordering for the data, or the generated results may vary between job execu=
tions.

```
```

```
```

column.**NOTE: **If it is present, the dataset is first grouped by=
the `group`

value before it is ordered by the values in the

**NOTE:** The `order`

column does not need to be=
sorted before the transform is executed on it.

**Tip: **To sort in reverse order, prepend the column name =
with a dash (`-MyDate`

).

**Usage Notes:**

Required? | Data Type |
---|---|

No | String (column name) |

Identifies the column by which the dataset is grouped for purposes of ap= plying the transform.

The `ProdId`

column contains three values: `P002`

, and `P003`

, and you=
add the following transform:

derive type: single value:SUM(Sales) group:ProdId= as:'SalesByProd'

The above transform generates the `SalesByProd`

col=
umn, which contains the sum of the `Sales`

values, as =
grouped according to the three product identifiers.

If the value parameter contains aggregate or window functions, you can a=
pply the `group`

parameter to specify subsets of recor=
ds across which the value computation is applied.

`group`

parameter can result in non-deterministic re-ordering in=
the data grid. However, if you're running your job on the Spark runn=
ing environment, you should apply the=20
`group`

parameter, or your job may run out of memory and fail. T=
o avoid this issue and to enforce row ordering, use the=20
`sort`

transform. For more information, see=20
Sort Transform.=20
You can specify one or more columns by which to group using comma-separa= ted column references.

**Usage Notes:**

=20
=20
=20

Required? | Data Type |
---|---|

No | String (column name) |

Name of the new column that is being generated. If the `as parameter is not specified, a default name is used.`

```
=20
```derive type: single value:(colX * colY) as:'areaX=
Y'

**Output:** Generates a new column containing the =
;product of the values in columns `colX`

and ```
col=
Y
```

. New column is explicitly named, `areaXY`

.

**Usage Notes:**

=20
=20
=20
Required?
Data Type
No
String (column name)

## Examples

**Tip:** For additional examples, see Common Tasks.

### Example - Basic Deri=
ve Examples

The following dataset is used for performing some simple statistical ana=
lysis using the `derive`

transform.

**Source:**

=20
=20
=20
=20
StudentId
TestNumber
TestScore
S001
1
78
S001
2
85
S001
3
81
S002
1
84
S002
2
92
S002
3
77
S003
1
83
S003
2
88
S003
3
85

**Transform:**

First, you can calculate the total average score across all tests:

=
=20
derive type: single value:AVERAGE(Score) as:'avgS=
core'

In their unformatted form, the output values are lengthy. You can edit t=
he above transform to nest the `value`

statement with prope=
r formatting using the `NUMFORMAT`

function:

=20
derive type: single value:NUMFORMAT(=
AVERAGE(Score),'##.00') as:'avgScore'

You might also be interested to know how individual students fared and t=
o identify which tests caused the greatest challenges for the students:

=
=20
derive type: single value: NUMFORMAT=
(AVERAGE(Score),'##.00') group: StudentId as: 'avgScorebyStudentId'<=
/p>=20

derive type: single value:NUMFORMAT(=
AVERAGE(Score),'##.00') group:TestNumber as:'avgScoreByTest'

To calculate total scores for each student, add the following. Since eac=
h individual test score is a whole number, no rounding formatting is requir=
ed.

=20
derive type: single value:SUM(Score)=
group:StudentId as:'totalScoreByStudentId'

**Results:**

=20
=20
=20
=20
=20
=20
=20
=20
StudentId
TestNumber
TestScore
avgScore
avgScorebyStudentId
ScoreByTest
totalScoreByStudentId
S001
1
78
83.67
81.33
81.67
244
S001
2
85
83.67
81.33
88.33
244
S001
3
81
83.67
81.33
81.00
244
S002
1
84
83.67
84.33
81.67
253
S002
2
92
83.67
84.33
88.33
253
S002
3
77
83.67
84.33
81.00
253
S003
1
83
83.67
85.33
81.67
256
S003
2
88
83.67
85.33
88.33
256
S003
3
85
83.67
85.33
81.00
256

### Example - Rounding Fun=
ctions

The following example demonstrates how the rounding functions work toget=
her. These functions include the following:

`FLOOR`

- largest integer that is not greater than the input=
value. See FLOOR Function.
`CEILING`

- smallest integer that is not less than the =
input value. See CEILING Fu=
nction.
`ROUND`

- nearest integer to the input value. See =
ROUND Function.
`MOD`

- remainder integer when input1 is divided by inp=
ut2. See Numeric Operators=
.

**Source:**

rowNum
X
1
-2.5
2
-1.2
3
0
4
1
5
1.5
6
2.5
7
3.9
8
4
9
4.1
10
11

**Transform:**

=20
derive type: single value: FLOOR (X) as: 'floorX'=

=20
derive type: single value: CEILING (=
X) as: 'ceilingX'

=20
derive type: single value: ROUND (X)=
as: 'roundX'

=20
derive type: single value: (X % 2) a=
s: 'modX'

**Results:**

rowNum
X
modX
roundX
ceilingX
floorX
1
-2.5
-2
-2
-3
2
-1.2
-1
-1
-2
3
0
0
0
0
0
4
1
1
1
1
1
5
1.5
2
2
1
6
2.5
3
3
2
7
3.9
4
4
3
8
4
0
4
4
4
9
4.1
4
5
4
10
11
1
11
11
11

### Example - Other Examples

In the language documentation, there are many other examples where the&n=
bsp;`derive`

transform is used.

=20

```
------=_Part_97394_517100870.1614819756306--
```