Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

D toc

You can apply aggregate functions to groups of values in one or more columns to generate aggregated data. Depending on how you configure the Group By transformation, the output of these transformations is a new table or one or more columns in the current dataset.

Limitations

  • The Group By transformation does not support nested expressions. You cannot insert multiple nested expressions in your computed value.
  • The Group By transformation supports aggregation functions only. For more information, see Aggregate Functions.

Example Data

The following table contains test score data from a set of students for four separate tests, spread over two days:

StudentTestDateTestNumTestScore
Anna09/08/2018184
Ben09/08/2018171
Caleb09/08/2018176
Danielle09/08/2018187
Anna09/08/2018292
Ben09/08/2018286
Caleb09/08/2018299
Danielle09/08/2018273
Anna09/15/2018386
Ben09/15/2018399
Caleb09/15/2018386
Danielle09/15/2018380
Anna09/15/2018485
Ben09/15/2018487
Caleb09/15/2018479
Danielle09/15/2018493

Aggregating across all rows (no grouping)

You can perform basic computations across all rows of the dataset. For example, the following transformation creates a new column containing the average test score for all students:

D trans
p03Valueavg_TestScore
Typestep
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueROUND(AVERAGE(Score),2)
p03NameNew column name
SearchTermNew formula

The above results in a new column called, average_TestScore, containing the single value 85.19, which is the average of all students' test scores rounded to two decimal places.

Info

NOTE: These types of aggregations are known as flat aggregations. In larger datasets, performing flat aggregations can be computationally intensive. Be careful in computing any aggregation functions across a large number of rows.

Aggregate grouped-by rows

For the above example data, suppose you are interested in the average score for each student. In this case, you must compute the average (AVERAGE(TestScore)) for each student.

In the previous transformation, you used the New Formula transformation. When you are computing aggregations across groups of values in a column, you must use the Group By transformation:

D trans
p03ValueGroup by as new column(s)
Typestep
p01NameGroup By
p01ValueStudent
p02NameValues
p02ValueAVERAGE(TestScore)
p03NameType
SearchTermGroup By

Note that the above transformation does not contain the rounding function. Nested expressions are not supported in the Group By transformation. To round the values, add the following transformation as the next step:

D trans
Typestep
p01NameColumns
p01Valueaverage_TestScore
p02NameFormula
p02ValueROUND(average_TestScore,2)
SearchTermEdit column with formula

You may wish to rename the newly generated column to something like average_TestScorePerStudent instead. See Rename Columns.

The output data should look like the following:

StudentTestDateTestNumTestScoreaverage_TestScorePerStudentaverage_TestScore
Anna09/08/201818486.7585.19
Ben09/08/201817185.7585.19
Caleb09/08/20181768585.19
Danielle09/08/201818783.2585.19
Anna09/08/201829286.7585.19
Ben09/08/201828685.7585.19
Caleb09/08/20182998585.19
Danielle09/08/201827383.2585.19
Anna09/15/201838686.7585.19
Ben09/15/201839985.7585.19
Caleb09/15/20183868585.19
Danielle09/15/201838083.2585.19
Anna09/15/201848586.7585.19
Ben09/15/201848785.7585.19
Caleb09/15/20184798585.19
Danielle09/15/201849383.2585.19

Generate new aggregation table

Suppose you wish to calculate the minimum, maximum, and average scores for each test. In this case, it may be more useful to create a new table in which the student names have been removed:

D trans
p03ValueMIN(TestScore)
Typestep
p05NameType
p01NameGroup By
p01ValueTestNum
p02NameValues1
p02ValueMAX(TestScore)
p05ValueGroup by as new table
p03NameValues2
p04ValueAVERAGE(TestScore)
p04NameValues3
SearchTermGroup By

The resulting data looks like the following:

TestNummax_TestScoremin_TestScoreaverage_TestScore
1877179.5
2997387.5
3998087.75
4937986
Tip

Tip: In this case, when you replace the existing table with a completely new table, data that is not included in the aggregation is lost. You can add columns to the list of values if you wish to bring forward untouched columns into the new table. You may also consider building aggregation tables in a recipe that is extended from the previous recipe, so that you can continue to work with the other columns in your dataset.