You can use a variety of mathematical and statistical functions to calculate metrics within a column.
To calculate metrics across columns, you can use a generalized version of the following example.
Your dataset tracks swimmer performance across multiple heats in a race, and you would like to calculate best, worst, and average times in seconds across all three heats. Here's the data:
In the above data, Racer Y was disqualified (DQ) in Heat 2.
To compute the metrics, you must bundle the data into an array, break out the array into separate rows, and then calculate your metrics by grouping. Here are the steps:
When the data is imported, you may need to create a header for each row:
D trans Type step p01Name Option p01Value Use row as header p02Name Row p02Value 1 SearchTerm Rename columns with a row
- The columns containing heat time data may need to be retyped. From the drop-down next to each column name, select Decimal type.
DQvalue in the Heat2 column is invalid data for Decimal type. You can use the following transform transformation to turn it into a missing valuemissing value. For purposes of calculating averages, you may or may not want to turn invalid data into zeroes or blanks. In this case, replacing the data as
0.00causes causes improper calculations for the metrics.
D trans p03Value '' Type step p01Name Column p01Value Heat2 p02Name Find p02Value 'DQ' p03Name Replace with SearchTerm Replace text or patterns
Use the following to gather all of the heat data into two columns:
D trans Type step p01Name Columns p01Value Heat1,Heat2,Heat3 p02Name Group size p02Value 1 SearchTerm Unpivot columns
You can now rename the two columns. Rename Rename
You may want to delete the rows that have a missing value for missing value for
D trans Type step p01Name Condition p01Value ISMISSING([value]) SearchTerm Delete rows
You can now perform calculations on this column. The following transforms transformations calculate minimum, average (mean), and maximum times for each racer:
D trans p03Value Racer Type step p01Name Formula type p01Value Multiple row formula p02Name Formula p02Value MIN(HeatTime) p03Name Group rows by p04Value 'BestTime' p04Name New column name SearchTerm New formula D trans p03Value Racer Type step p01Name Formula type p01Value Multiple row formula p02Name Formula p02Value AVERAGE(HeatTime) p03Name Group rows by p04Value 'AvgTime' p04Name New column name SearchTerm New formula D trans p03Value Racer Type step p01Name Formula type p01Value Multiple row formula p02Name Formula p02Value MAX(HeatTime) p03Name Group rows by p04Value 'WorstTime' p04Name New column name SearchTerm New formula
To make the data look better, you might want to reformat the values in the the
AvgTimecolumn to two decimal points:
D trans Type step p01Name Columns p01Value AvgTime p02Name Formula p02Value NUMFORMAT(AvgTime, '##.00') SearchTerm Edit column with formula
After you use the
move transform to the Move transformation to re-organize your columns, the dataset should look like the following: