**Contents:**

**The cloud-based version of Trifacta Wrangler is now available! Read all about it, and register for your free account.**

You can use a variety of mathematical and statistical functions to calculate metrics within a column.

- See Aggregate Functions.
- See Math Functions.

To calculate metrics across columns, you can use a generalized version of the following example.

**Source:**

Your dataset tracks swimmer performance across multiple heats in a race, and you would like to calculate best, worst, and average times in seconds across all three heats. Here's the data:

Racer | Heat1 | Heat2 | Heat3 |
---|---|---|---|

Racer X | 37.22 | 38.22 | 37.61 |

Racer Y | 41.33 | DQ | 38.04 |

Racer Z | 39.27 | 39.04 | 38.85 |

** **In the above data, Racer Y was disqualified (DQ) in Heat 2.

**Transform:**

To compute the metrics, you must bundle the data into an array, break out the array into separate rows, and then calculate your metrics by grouping. Here are the steps:

When the data is imported, you may need to create a header for each row:

Transformation Name `Rename columns with a row`

Parameter: Option `Use row as header`

Parameter: Row `1`

- The columns containing heat time data may need to be retyped. From the drop-down next to each column name, select Decimal type.
The

`DQ`

value in the Heat2 column is invalid data for Decimal type. You can use the following transform to turn it into a missing value. For purposes of calculating averages, you may or may not want to turn invalid data into zeroes or blanks. In this case, replacing the data as`0.00`

causes improper calculations for the metrics.

Transformation Name `Replace text or patterns`

Parameter: Column `Heat2`

Parameter: Find `'DQ'`

Parameter: Replace with `''`

Use the following to gather all of the heat data into two columns:

Transformation Name `Unpivot columns`

Parameter: Columns `Heat1,Heat2,Heat3`

Parameter: Group size `1`

You can now rename the two columns. Rename

`key`

to`HeatNum`

and`value`

to`HeatTime`

.You may want to delete the rows that have a missing value for

`HeatTime`

:

Transformation Name `Delete rows`

Parameter: Condition `ISMISSING([value])`

You can now perform calculations on this column. The following transforms calculate minimum, average (mean), and maximum times for each racer:

Transformation Name `New formula`

Parameter: Formula type `Multiple row formula`

Parameter: Formula `MIN(HeatTime)`

Parameter: Group rows by `Racer`

Parameter: New column name `'BestTime'`

Transformation Name `New formula`

Parameter: Formula type `Multiple row formula`

Parameter: Formula `AVERAGE(HeatTime)`

Parameter: Group rows by `Racer`

Parameter: New column name `'AvgTime'`

Transformation Name `New formula`

Parameter: Formula type `Multiple row formula`

Parameter: Formula `MAX(HeatTime)`

Parameter: Group rows by `Racer`

Parameter: New column name `'WorstTime'`

To make the data look better, you might want to reformat the values in the

`AvgTime`

column to two decimal points:

Transformation Name `Edit column with formula`

Parameter: Columns `AvgTime`

Parameter: Formula `NUMFORMAT(AvgTime, '##.00')`

**Results:**

After you use the `move`

transform to re-organize your columns, the dataset should look like the following:

Racer | HeatNum | HeatTime | BestTime | WorstTime | AvgTime |
---|---|---|---|---|---|

Racer X | Heat1 | 37.22 | 37.22 | 38.22 | 37.68 |

Racer X | Heat2 | 38.22 | 37.22 | 38.22 | 37.68 |

Racer X | Heat3 | 37.61 | 37.22 | 38.22 | 37.68 |

Racer Y | Heat1 | 41.33 | 38.04 | 41.33 | 39.69 |

Racer Y | Heat3 | 38.04 | 38.04 | 41.33 | 39.69 |

Racer Z | Heat1 | 39.27 | 38.85 | 39.27 | 39.05 |

Racer Z | Heat2 | 39.04 | 38.85 | 39.27 | 39.05 |

Racer Z | Heat3 | 38.85 | 38.85 | 39.27 | 39.05 |

- task
- ui
- analyst
- data_analyst
- row
- how_to
- discover
- unpivot
- wrangle_transform_header
- wrangle_transform_replace
- wrangle_transform_unpivot
- wrangle_transform_delete
- wrangle_transform_derive
- wrangle_transform_set
- function
- min
- mean
- max
- numformat
- missing
- wrangle_function_min
- wrangle_function_max
- wrangle_function_numformat
- recipe
- average
- wrangle_function_average
- ismissing
- wrangle_function_ismissing

This page has no comments.