Page tree

You are viewing an old version of this page. View the current version.

Version 2

Contents:

Computes an approximation for a specified quartile across all row values in a column or group. Input column can be of Integer or Decimal.
• If a row contains a missing or null value, it is not factored into the calculation. If the entire column contains no values, the function returns a null value.
• When used in a `pivot` transform, the function is computed for each instance of the value specified in the `group` parameter. See Pivot Transform.
• The approximate percentile functions utilize a different algorithm for efficiently estimating quantiles for streaming and distributed processing, depending on the running environment where the function is computed.

Tip: Approximation functions are suitable for larger datasets. As the number of rows increases, accuracy and calculation performance improves for these functions.

Quartiles are computed as follows:

QuartileDescription
0Minimum value
125th percentile
2Median value
375th percentile and higher

Wrangle vs. SQL: This function is part of Wrangle , a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.

Basic Usage

approximatequartile(myScores, 3)

Output: Computes the approximate value that is at the 3rd quartile across all values in the myScores column.

Syntax and Arguments

approximatequartile(function_col_ref,num_quartile) [group:group_col_ref] [limit:limit_count]

ArgumentRequired?Data TypeDescription
function_col_refYstringName of column to which to apply the function
num_quartileYintegerInteger value (0-3) of the quartile to compute
dec_error_boundNdecimalError factor for computing approximations. Decimal value represents error factor as a percentage (`0.4` is 0.4%).

For more information on the `group` and `limit` parameters, see Pivot Transform.

function_col_ref

Name of the column the values of which you want to calculate the quartile. Column must contain Integer or Decimal values.

• Literal values are not supported as inputs.
• Multiple columns and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString (column reference)```precipitationIn ```

num_quartile

Integer literal value indicating the quartile that you wish to compute. Input value must be between `0` and `3`, inclusive.

• Column or function references are not supported.
• Multiple columns and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesInteger`3`

dec_error_bound

As needed, you can insert an error boundary factor as a parameter into the computation of this approximate value.

• This value must be a Decimal literal value.
• This decimal value represents the percentage error factor. By default, this value is `0.5` (0.5%).

Usage Notes:

Required?Data TypeExample Value
NoDecimal (literal)```0.01 ```

Examples

Example - Percentile functions

This example illustrates how you can apply the following percentile-related functions to your transformations:
• `MEDIAN` - Calculate the median value from a column of values. See MEDIAN Function.
• `PERCENTILE` - Calculate a specified percentile for a column of values. See PERCENTILE Function.
• `QUARTILE` - Calculate a specified quartile for a column of values. See QUARTILE Function.

The following functions use an approximation technique for calculating median, percentile, and quartiles. In some cases, these calculations can be computed faster across large datasets.

• `APPROXIMATEMEDIAN` - Calculate a close approximation of the median value from a column of values. See APPROXIMATEMEDIAN Function.
• `APPROXIMATEPERCENTILE` - Calculate a close approximation of a specified percentile for a column of values. See APPROXIMATEPERCENTILE Function.
• `APPROXIMATEQUARTILE` - Calculate a close approximation of a specified quartile for a column of values. See APPROXIMATEQUARTILE Function.

Source:

The following table lists each student's height in inches:

StudentHeight
164
265
363
464
562
666
766
865
969
1066
1173
1269
1369
1461
1564
1661
1771
1867
1973
2066

Transformation:

Use the following transformations to calculate the median height in inches, a specified percentile and the first quartile.

• The first function uses a precise algorithm which can be slow to execute across large datasets.
• The second function uses an appropriate approximation algorithm, which is much faster to execute across large datasets.
• These approximate functions can use an error boundary parameter, which is set to `0.4` (0.4%) across all functions.

Median: This transformation calculates the median value, which corresponds to the 50th percentile.

Transformation Name `New formula` `Single row formula` `median(heightIn)` `'medianIn'`

Transformation Name `New formula` `Single row formula` `approximatemedian(heightIn, 0.4)` `'approxMedianIn'`

Percentile: This transformation calculates the 68th percentile.

Transformation Name `New formula` `Single row formula` `percentile(heightIn, 68, linear)` `'percentile68In'`

Transformation Name `New formula` `Single row formula` `approximatepercentile(heightIn, 68, 0.4)` `'approxPercentile68In'`

Quartile: This transformation calculates the first quartile, which corresponds to the 25th percentile.

Transformation Name `New formula` `Single row formula` `quartile(heightIn, 1, linear)` `'percentile25In'`

Transformation Name `New formula` `Single row formula` `approximatequartile(heightIn, 1, 0.4)` `'approxPercentile25In'`

Results:

studentIdheightInapproxPercentile25Inpercentile25InapproxPercentile68Inpercentile68InapproxMedianInmedianIn
164646467.166.926666
265646467.166.926666
363646467.166.926666
464646467.166.926666
562646467.166.926666
666646467.166.926666
766646467.166.926666
865646467.166.926666
969646467.166.926666
1066646467.166.926666
1173646467.166.926666
1269646467.166.926666
1369646467.166.926666
1461646467.166.926666
1564646467.166.926666
1661646467.166.926666
1771646467.166.926666
1867646467.166.926666
1973646467.166.926666
2066646467.166.926666

• Page:
• Page:
• Page:
• Page:
• Page:
• Page:
• Page:
• Page:
• Page:
• Page:
• Page:
• Page:
• Page:
• Page:
• Page:

• No labels