Page tree


Contents:

   

Contents:


Computes an approximation for a specified quartile across all row values in a column or group. Input column can be of Integer or Decimal.
  • If a row contains a missing or null value, it is not factored into the calculation. If the entire column contains no values, the function returns a null value.
  • When used in a pivot transform, the function is computed for each instance of the value specified in the group parameter. See Pivot Transform.
  • The approximate percentile functions utilize a different algorithm for efficiently estimating quantiles for streaming and distributed processing, depending on the running environment where the function is computed.

    Tip: Approximation functions are suitable for larger datasets. As the number of rows increases, accuracy and calculation performance improves for these functions. 

Quartiles are computed as follows:

QuartileDescription
0Minimum value
125th percentile
2Median value
375th percentile and higher

Wrangle vs. SQL: This function is part of Wrangle , a proprietary data transformation language. Wrangle is not SQL. For more information, see Wrangle Language.

Basic Usage

<span>approximatequartile</span>(myScores, 3)

Output: Computes the approximate value that is at the 3rd quartile across all values in the myScores column.

Syntax and Arguments

approximatequartile(function_col_ref,num_quartile) [group:group_col_ref] [limit:limit_count]


ArgumentRequired?Data TypeDescription
function_col_refYstringName of column to which to apply the function
num_quartileYintegerInteger value (0-3) of the quartile to compute
dec_error_boundNdecimalError factor for computing approximations. Decimal value represents error factor as a percentage (0.4 is 0.4%).

For more information on the group and limit parameters, see Pivot Transform.

For more information on syntax standards, see Language Documentation Syntax Notes.

function_col_ref

Name of the column the values of which you want to calculate the quartile. Column must contain Integer or Decimal values.

  • Literal values are not supported as inputs.
  • Multiple columns and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString (column reference)precipitationIn

num_quartile

Integer literal value indicating the quartile that you wish to compute. Input value must be between 0 and 3, inclusive.

  • Column or function references are not supported.
  • Multiple columns and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesInteger3

dec_error_bound

As needed, you can insert an error boundary factor as a parameter into the computation of this approximate value.


  • This value must be a Decimal literal value.
  • This decimal value represents the percentage error factor. By default, this value is 0.5 (0.5%). 

Usage Notes:

Required?Data TypeExample Value
NoDecimal (literal)0.01

Examples


Tip: For additional examples, see Common Tasks.

Example - Percentile functions

This example illustrates you to apply percentile functions.

Functions:

ItemDescription
MEDIAN Function Computes the median from all row values in a column or group. Input column can be of Integer or Decimal.
PERCENTILE Function Computes a specified percentile across all row values in a column or group. Input column can be of Integer or Decimal.
QUARTILE Function Computes a specified quartile across all row values in a column or group. Input column can be of Integer or Decimal.
APPROXIMATEMEDIAN Function Computes the approximate median from all row values in a column or group. Input column can be of Integer or Decimal.
APPROXIMATEPERCENTILE Function Computes an approximation for a specified percentile across all row values in a column or group. Input column can be of Integer or Decimal.
APPROXIMATEQUARTILE Function Computes an approximation for a specified quartile across all row values in a column or group. Input column can be of Integer or Decimal.

Source:

The following table lists each student's height in inches:

StudentHeight
164
265
363
464
562
666
766
865
969
1066
1173
1269
1369
1461
1564
1661
1771
1867
1973
2066

Transformation:

Use the following transformations to calculate the median height in inches, a specified percentile and the first quartile.

  • The first function uses a precise algorithm which can be slow to execute across large datasets.
  • The second function uses an appropriate approximation algorithm, which is much faster to execute across large datasets. 
    • These approximate functions can use an error boundary parameter, which is set to 0.4 (0.4%) across all functions.

Median: This transformation calculates the median value, which corresponds to the 50th percentile.

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula median(heightIn)
Parameter: New column name 'medianIn'

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula approximatemedian(heightIn, 0.4)
Parameter: New column name 'approxMedianIn'

Percentile: This transformation calculates the 68th percentile.

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula percentile(heightIn, 68, linear)
Parameter: New column name 'percentile68In'

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula approximatepercentile(heightIn, 68, 0.4)
Parameter: New column name 'approxPercentile68In'

Quartile: This transformation calculates the first quartile, which corresponds to the 25th percentile.

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula quartile(heightIn, 1, linear)
Parameter: New column name 'percentile25In'

Transformation Name New formula
Parameter: Formula type Single row formula
Parameter: Formula approximatequartile(heightIn, 1, 0.4)
Parameter: New column name 'approxPercentile25In'

Results:

studentIdheightInapproxPercentile25Inpercentile25InapproxPercentile68Inpercentile68InapproxMedianInmedianIn
164646467.166.926666
265646467.166.926666
363646467.166.926666
464646467.166.926666
562646467.166.926666
666646467.166.926666
766646467.166.926666
865646467.166.926666
969646467.166.926666
1066646467.166.926666
1173646467.166.926666
1269646467.166.926666
1369646467.166.926666
1461646467.166.926666
1564646467.166.926666
1661646467.166.926666
1771646467.166.926666
1867646467.166.926666
1973646467.166.926666
2066646467.166.926666

This page has no comments.