Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0821

This example illustrates how you can apply the following percentile-related functions to your transformations:

  • MEDIAN - Calculate the median value from a column of values. See MEDIAN Function.
  • PERCENTILE - Calculate a specified percentile for a column of values. See PERCENTILE Function.
  • QUARTILE - Calculate a specified quartile for a column of values. See QUARTILE Function.

The following functions use an approximation technique for calculating median, percentile, and quartiles. In some cases, these calculations can be computed faster across large datasets.

Source:

The following table lists each student's height in inches:

StudentHeight
164
265
363
464
562
666
766
865
969
1066
1173
1269
1369
1461
1564
1661
1771
1867
1973
2066

Transformation:

Use the following transformations to calculate the median height in inches, a specified percentile and the first quartile.

  • The first function uses a precise algorithm which can be slow to execute across large datasets.
  • The second function uses an appropriate approximation algorithm, which is much faster to execute across large datasets. 
    • These approximate functions can use an error boundary parameter, which is set to 0.4 (0.4%) across all functions.

Median: This transformation calculates the median value, which corresponds to the 50th percentile.

D trans
RawWrangletrue
p03Value'medianIn'
Typestep
WrangleTextderive type: single value: median(heightIn) as: 'medianIn'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02Valuemedian(heightIn)
p03NameNew column name
SearchTermNew formula

D trans
RawWrangletrue
p03Value'approxMedianIn'
Typestep
WrangleTextderive type: single value: approxmedian(heightIn, 0.4) as: 'approxMedianIn'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02Valueapproximatemedian(heightIn, 0.4)
p03NameNew column name
SearchTermNew formula

Percentile: This transformation calculates the 68th percentile.

D trans
RawWrangletrue
p03Value'percentile68In'
Typestep
WrangleTextderive type: single value: percentile(heightIn, 68, linear) as: 'percentile68In'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02Valuepercentile(heightIn, 68, linear)
p03NameNew column name
SearchTermNew formula

D trans
RawWrangletrue
p03Value'approxPercentile68In'
Typestep
WrangleTextderive type: single value: approximatepercentile(heightIn, 68, 0.4) as: 'approxpercentile68In'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02Valueapproximatepercentile(heightIn, 68, 0.4)
p03NameNew column name
SearchTermNew formula

Quartile: This transformation calculates the first quartile, which corresponds to the 25th percentile.

D trans
RawWrangletrue
p03Value'percentile25In'
Typestep
WrangleTextderive type: single value: quartile(heightIn, 1, linear) as: 'percentile25In'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02Valuequartile(heightIn, 1, linear)
p03NameNew column name
SearchTermNew formula

D trans
RawWrangletrue
p03Value'approxPercentile25In'
Typestep
WrangleTextderive type: single value: approximatequartile(heightIn, 1, 0.4) as: 'approxPercentile25In'
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02Valueapproximatequartile(heightIn, 1, 0.4)
p03NameNew column name
SearchTermNew formula

Results:

studentIdheightInapproxPercentile25Inpercentile25InapproxPercentile68Inpercentile68InapproxMedianInmedianIn
164646467.166.926666
265646467.166.926666
363646467.166.926666
464646467.166.926666
562646467.166.926666
666646467.166.926666
766646467.166.926666
865646467.166.926666
969646467.166.926666
1066646467.166.926666
1173646467.166.926666
1269646467.166.926666
1369646467.166.926666
1461646467.166.926666
1564646467.166.926666
1661646467.166.926666
1771646467.166.926666
1867646467.166.926666
1973646467.166.926666
2066646467.166.926666