##### Page tree

Contents:

Our documentation site is moving!

For up-to-date documentation of Dataprep, please visit us at https://help.alteryx.com/Dataprep/.

You can use a variety of mathematical and statistical functions to calculate metrics within a column.

To calculate metrics across columns, you can use a generalized version of the following example.

Source:

Your dataset tracks swimmer performance across multiple heats in a race, and you would like to calculate best, worst, and average times in seconds across all three heats. Here's the data:

RacerHeat1Heat2Heat3
Racer X37.2238.2237.61
Racer Y41.33DQ38.04
Racer Z39.2739.0438.85

In the above data, Racer Y was disqualified (DQ) in Heat 2.

Transformation:

To compute the metrics, you must bundle the data into an array, break out the array into separate rows, and then calculate your metrics by grouping. Here are the steps:

1. When the data is imported, you may need to create a header for each row:

Transformation Name `Rename columns with a row` `Use row as header` `1`

2. The columns containing heat time data may need to be retyped. From the drop-down next to each column name, select Decimal type.
3. The `DQ` value in the Heat2 column is invalid data for Decimal type. You can use the following transformation to turn it into a missing value. For purposes of calculating averages, you may or may not want to turn invalid data into zeroes or blanks. In this case, replacing the data as `0.00` causes improper calculations for the metrics.

Transformation Name `Replace text or patterns` `Heat2` `'DQ'` `''`

4. Use the following to gather all of the heat data into two columns:

Transformation Name `Unpivot columns` `Heat1,Heat2,Heat3` `1`

5. You can now rename the two columns. Rename `key` to `HeatNum` and `value` to `HeatTime`.

6. You may want to delete the rows that have a missing value for `HeatTime`:

Transformation Name `Delete rows` `ISMISSING([value])`

7. You can now perform calculations on this column. The following transformations calculate minimum, average (mean), and maximum times for each racer:

Transformation Name `New formula` `Multiple row formula` `MIN(HeatTime)` `Racer` `'BestTime'`

Transformation Name `New formula` `Multiple row formula` `AVERAGE(HeatTime)` `Racer` `'AvgTime'`

Transformation Name `New formula` `Multiple row formula` `MAX(HeatTime)` `Racer` `'WorstTime'`

8. To make the data look better, you might want to reformat the values in the `AvgTime` column to two decimal points:

Transformation Name `Edit column with formula` `AvgTime` `NUMFORMAT(AvgTime, '##.00')`

Results:

After you use the Move transformation to re-organize your columns, the dataset should look like the following:

RacerHeatNumHeatTimeBestTimeWorstTimeAvgTime
Racer XHeat137.2237.2238.2237.68
Racer XHeat238.2237.2238.2237.68
Racer XHeat337.6137.2238.2237.68
Racer YHeat141.3338.0441.3339.69
Racer YHeat338.0438.0441.3339.69
Racer ZHeat139.2738.8539.2739.05
Racer ZHeat239.0438.8539.2739.05
Racer ZHeat338.8538.8539.2739.05