Message-ID: <1578087455.1791.1560749121475.JavaMail.daemon@e613c95ee270> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_1790_337558628.1560749121475" ------=_Part_1790_337558628.1560749121475 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html Calculate Metrics across Columns

Calculate Metrics across Columns

You can use a variety of mathematical and statistical func= tions to calculate metrics within a column.

To calculate metrics across columns, you can use a generalized ver= sion of the following example.

Source:

Your dataset tracks swimmer performance across multiple heats in a= race, and you would like to calculate best, worst, and average times in se= conds across all three heats. Here's the data:

=20 =20 =20 =20 =20
Racer Heat1 Heat2 Heat3
Racer X 37.22 38.22 37.61
Racer Y 41.33 DQ 38.04
Racer Z 39.27 39.04 38.85

In the above data, Racer Y was disqualified (DQ) = in Heat 2.

Transform:

To compute the metrics, you must bundle the data into an array, break ou= t the array into separate rows, and then calculate your metrics by grouping= . Here are the steps:

1. When the data is imported, you may need to create a header for each = row:

=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name `Rename columns with a row` `Use row as header` `1`

2. The columns containing heat time data may need to be retyped. From the = drop-down next to each column name, select Decimal type.
3. The `DQ` value in the Heat2 column is invalid data for De= cimal type. You can use the following transform to turn it into a missing v= alue. For purposes of calculating averages, you may or may not want t= o turn invalid data into zeroes or blanks. In this case, replacing the data= as `0.00` causes improper calculations for the m= etrics.

=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name `Replace text or patterns` `Heat2` `'DQ'` `''`

4. Use the following to gather all of the heat data into two colu= mns:

=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name `Unpivot columns` `Heat1,Heat2,Heat3` `1`

5. You can now rename the two columns. Rename `key` to= `HeatNum` and `value` to `HeatTime`.

6. You may want to delete the rows that have a missing value for = `HeatTime`:

=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name <= code>Delete rows `ISMISSING([value])`

7. You can now perform calculations on this column. The following= transforms calculate minimum, average (mean), and maximum times for each r= acer:

=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name <= code>New formula `Multiple row formula` `MIN(HeatTime)` `Racer` `'BestTime'`

=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name <= code>New formula `Multiple row formula` `AVERAGE(HeatTime)` `Racer` `'AvgTime'`

=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name <= code>New formula `Multiple row formula` `MAX(HeatTime)` `Racer` `'WorstTime'`

8. To make the data look better, you might want to reformat the values = in the `AvgTime` column to two decimal points:

=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name `Edit column with formula` `AvgTime` `NUMFORMAT(AvgTime, '##.00')`

Results:

After you use the `move` transform to re-organize your = columns, the dataset should look like the following:

=20 =20 =20 =20 =20 =20 =20
Racer HeatNum HeatTime BestTime WorstTime AvgTime
Racer X Heat1 37.22 37.22 38.22 37.68
Racer X Heat2 38.22 37.22 38.22 37.68
Racer X Heat3 37.61 37.22 38.22 37.68
Racer Y Heat1 41.33 38.04 41.33 39.69
Racer Y Heat3 38.04 38.04 41.33 39.69
Racer Z Heat1 39.27 38.85 39.27 39.05
Racer Z Heat2 39.04 38.85 39.27 39.05
Racer Z Heat3 38.85 38.85 39.27 39.05
------=_Part_1790_337558628.1560749121475--