Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0710

This example illustrates how you can apply statistical functions to your dataset. Calculations include average (mean), max, min, standard deviation, and variance.

Source:

Students took a test and recorded the following scores. You want to perform some statistical analysis on them:

StudentScore
Anna84
Ben71
Caleb76
Danielle87
Evan85
Faith92
Gabe85
Hannah99
Ian73
Jane68

Transformation:

You can use the following transformations to calculate the average (mean), minimum, and maximum scores:

D trans
RawWrangletrue
p03Value'avgScore'
Typestep
WrangleTextderive type:single value:AVERAGE(Score) as:'avgScore'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueAVERAGE(Score)
p03NameNew column name
SearchTermNew formula

D trans
RawWrangletrue
p03Value'minScore'
Typestep
WrangleTextderive type:single value:MIN(Score) as:'minScore'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueMIN(Score)
p03NameNew column name
SearchTermNew formula

D trans
RawWrangletrue
p03Value'maxScore'
Typestep
WrangleTextderive type:single value:MAX(Score) as:'maxScore'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueMAX(Score)
p03NameNew column name
SearchTermNew formula

To apply statistical functions to your data, you can use the VAR and STDEV functions, which can be used as the basis for other statistical calculations.

D trans
RawWrangletrue
p03Valuevar_Score
Typestep
WrangleTextderive type:single value:VAR(Score) as:'var_Score'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueVAR(Score)
p03NameNew column name
SearchTermNew formula

D trans
RawWrangletrue
p03Valuestdev_Score
Typestep
WrangleTextderive type:single value:STDEV(Score) as:'stdev_Score'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueSTDEV(Score)
p03NameNew column name
SearchTermNew formula

For each score, you can now calculate the variation of each one from the average, using the following:

D trans
RawWrangletrue
p03Value'stDevs'
Typestep
WrangleTextderive type:single value:((Score - avg_Score) / stdev_Score) as:'stDevs'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02Value((Score - avg_Score) / stdev_Score)
p03NameNew column name
SearchTermNew formula

Now, you want to apply grades based on a formula:

Gradestandard deviations from avg (stDevs)
AstDevs > 1
BstDevs > 0.5
C-1 <= stDevs <= 0.5
DstDevs < -1
FstDevs < -2

You can build the following transformation using the IF function to calculate grades.

D trans
RawWrangletrue
Typestep
WrangleTextderive type:single value:IF((stDevs > 1),'A',IF((stDevs < -2),'F',IF((stDevs < -1),'D',IF((stDevs > 0.5),'B','C'))))
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueIF((stDevs > 1),'A',IF((stDevs < -2),'F',IF((stDevs < -1),'D',IF((stDevs > 0.5),'B','C'))))
SearchTermNew formula

For more information, see IF Function.

To clean up the content, you might want to apply some formatting to the score columns. The following reformats the stdev_Score and stDevs columns to display two decimal places:

D trans
RawWrangletrue
Typestep
WrangleTextset col:stdev_Score value:NUMFORMAT(stdev_Score, '##.00')
p01NameColumns
p01Valuestdev_Score
p02NameFormula
p02ValueNUMFORMAT(stdev_Score, '##.00')
SearchTermEdit column with formula

D trans
RawWrangletrue
Typestep
WrangleTextset col:stDevs value:NUMFORMAT(stDevs, '##.00')
p01NameColumns
p01ValuestDevs
p02NameFormula
p02ValueNUMFORMAT(stDevs, '##.00')
SearchTermEdit column with formula

D trans
RawWrangletrue
p03Value'modeScore'
Typestep
WrangleTextderive type:single value:MODE(Score) as:'modeScore'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueMODE(Score)
p03NameNew column name
SearchTermNew formula

Results:

StudentScoremodeScoreavgScoreminScoremaxScorevar_Scorestdev_ScorestDevsGrade
Anna8485826899

87.00000000000001

9.330.21C
Ben718582689987.000000000000019.33-1.18D
Caleb768582689987.000000000000019.33-0.64C
Danielle878582689987.000000000000019.330.54B
Evan858582689987.000000000000019.330.32C
Faith928582689987.000000000000019.331.07A
Gabe858582689987.000000000000019.330.32C
Hannah998582689987.000000000000019.331.82A
Ian738582689987.000000000000019.33-0.96C
Jane688582689987.000000000000019.33-1.50D