Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0822

You can use a variety of mathematical and statistical functions to calculate metrics within a column.

To calculate metrics across columns, you can use a generalized version of the following example.

Source:

Your dataset tracks swimmer performance across multiple heats in a race, and you would like to calculate best, worst, and average times in seconds across all three heats. Here's the data:

RacerHeat1Heat2Heat3
Racer X37.2238.2237.61
Racer Y41.33DQ38.04
Racer Z39.2739.0438.85


In the above data, Racer Y was disqualified (DQ) in Heat 2.

Transformation:

To compute the metrics, you must bundle the data into an array, break out the array into separate rows, and then calculate your metrics by grouping. Here are the steps:

  1. When the data is imported, you may need to create a header for each row:

    D trans
    Typestep
    p01NameOption
    p01ValueUse row as header
    p02NameRow
    p02Value1
    SearchTermRename columns with a row

  2. The columns containing heat time data may need to be retyped. From the drop-down next to each column name, select Decimal type.
  3. The DQ value in the Heat2 column is invalid data for Decimal type. You can use the following transformation to turn it into a missing value. For purposes of calculating averages, you may or may not want to turn invalid data into zeroes or blanks. In this case, replacing the data as 0.00 causes improper calculations for the metrics.

    D trans
    p03Value''
    Typestep
    p01NameColumn
    p01ValueHeat2
    p02NameFind
    p02Value'DQ'
    p03NameReplace with
    SearchTermReplace text or patterns

  4. Use the following to gather all of the heat data into two columns:

    D trans
    Typestep
    p01NameColumns
    p01ValueHeat1,Heat2,Heat3
    p02NameGroup size
    p02Value1
    SearchTermUnpivot columns

  5. You can now rename the two columns. Rename key to HeatNum and value to HeatTime.

  6. You may want to delete the rows that have a missing value for HeatTime:

    D trans
    Typestep
    p01NameCondition
    p01ValueISMISSING([value])
    SearchTermDelete rows

  7. You can now perform calculations on this column. The following transformations calculate minimum, average (mean), and maximum times for each racer:

    D trans
    p03ValueRacer
    Typestep
    p01NameFormula type
    p01ValueMultiple row formula
    p02NameFormula
    p02ValueMIN(HeatTime)
    p03NameGroup rows by
    p04Value'BestTime'
    p04NameNew column name
    SearchTermNew formula


    D trans
    p03ValueRacer
    Typestep
    p01NameFormula type
    p01ValueMultiple row formula
    p02NameFormula
    p02ValueAVERAGE(HeatTime)
    p03NameGroup rows by
    p04Value'AvgTime'
    p04NameNew column name
    SearchTermNew formula

    D trans
    p03ValueRacer
    Typestep
    p01NameFormula type
    p01ValueMultiple row formula
    p02NameFormula
    p02ValueMAX(HeatTime)
    p03NameGroup rows by
    p04Value'WorstTime'
    p04NameNew column name
    SearchTermNew formula

     

  8. To make the data look better, you might want to reformat the values in the AvgTime column to two decimal points:

    D trans
    Typestep
    p01NameColumns
    p01ValueAvgTime
    p02NameFormula
    p02ValueNUMFORMAT(AvgTime, '##.00')
    SearchTermEdit column with formula

Results:

After you use the Move transformation to re-organize your columns, the dataset should look like the following:

RacerHeatNumHeatTimeBestTimeWorstTimeAvgTime
Racer XHeat137.2237.2238.2237.68
Racer XHeat238.2237.2238.2237.68
Racer XHeat337.6137.2238.2237.68
Racer YHeat141.3338.0441.3339.69
Racer YHeat338.0438.0441.3339.69
Racer ZHeat139.2738.8539.2739.05
Racer ZHeat239.0438.8539.2739.05
Racer ZHeat338.8538.8539.2739.05