You can use a variety of mathematical and statistical functions to calculate metrics within a column.

- See Aggregate Functions.
- See Math Functions.

To calculate metrics across columns, you can use a generalized version of the following example.

**Source:**

Your dataset tracks swimmer performance across multiple heats in a race, and you would like to calculate best, worst, and average times in seconds across all three heats. Here's the data:

Racer | Heat1 | Heat2 | Heat3 |
---|---|---|---|

Racer X | 37.22 | 38.22 | 37.61 |

Racer Y | 41.33 | DQ | 38.04 |

Racer Z | 39.27 | 39.04 | 38.85 |

In the above data, Racer Y was disqualified (DQ) in Heat 2.

**Transform:**

To compute the metrics, you must bundle the data into an array, break out the array into separate rows, and then calculate your metrics by grouping. Here are the steps:

When the data is imported, you may need to create a header for each row:

`header`

- The columns containing heat time data may need to be retyped. From the drop-down next to each column name, select Decimal type.
The

`DQ`

value in the Heat2 column is invalid data for Decimal type. You can use the following transform to turn it into a missing value. For purposes of calculating averages, you may or may not want to turn invalid data into zeroes or blanks. In this case, replacing the data as`0.00`

causes improper calculations for the metrics.`replace col:Heat2 with:'' on:'DQ'`

Use the following to gather all of the heat data into two columns:

`unpivot col:Heat1,Heat2,Heat3`

You can now rename the two columns. Rename

`key`

to`HeatNum`

and`value`

to`HeatTime`

.You may want to delete the rows that have a missing value for

`HeatTime`

:`delete row: ISMISSING([value])`

You can now perform calculations on this column. The following transforms calculate minimum, average (mean), and maximum times for each racer:

`derive type:single value:MIN(HeatTime) group:Racer as:'BestTime'`

`derive type:single value:AVERAGE(HeatTime) group:Racer as:'AvgTime'`

`derive type:single value:MAX(HeatTime) group:Racer as:'WorstTime'`

To make the data look better, you might want to reformat the values in the

`AvgTime`

column to two decimal points:`set col:AvgTime value:NUMFORMAT(AvgTime, '##.00')`

**Results:**

After you use the `move`

transform to re-organize your columns, the dataset should look like the following:

Racer | HeatNum | HeatTime | BestTime | WorstTime | AvgTime |
---|---|---|---|---|---|

Racer X | Heat1 | 37.22 | 37.22 | 38.22 | 37.68 |

Racer X | Heat2 | 38.22 | 37.22 | 38.22 | 37.68 |

Racer X | Heat3 | 37.61 | 37.22 | 38.22 | 37.68 |

Racer Y | Heat1 | 41.33 | 38.04 | 41.33 | 39.69 |

Racer Y | Heat3 | 38.04 | 38.04 | 41.33 | 39.69 |

Racer Z | Heat1 | 39.27 | 38.85 | 39.27 | 39.05 |

Racer Z | Heat2 | 39.04 | 38.85 | 39.27 | 39.05 |

Racer Z | Heat3 | 38.85 | 38.85 | 39.27 | 39.05 |

- task
- ui
- analyst
- data_analyst
- row
- how_to
- discover
- unpivot
- wrangle_transform_header
- wrangle_transform_replace
- wrangle_transform_unpivot
- wrangle_transform_delete
- wrangle_transform_derive
- wrangle_transform_set
- function
- min
- mean
- max
- numformat
- missing
- wrangle_function_min
- wrangle_function_max
- wrangle_function_numformat
- recipe
- average
- wrangle_function_average
- ismissing
- wrangle_function_ismissing

This page has no comments.