Date: Thu, 19 May 2022 21:03:53 +0000 (GMT) Message-ID: <1671606198.23526.1652994233747@c7585db71e40> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_23525_1800463532.1652994233746" ------=_Part_23525_1800463532.1652994233746 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html VARSAMPIF Function

# VARSAMPIF Function

Contents:

=20

=20
=20
=20

=20
=20

Generates the variance of values by group in a column that meet a sp= ecific condition=20 using the sample statistical method.

NOTE: When added to a transform, this function is appli= ed to the current sample. If you change your sample or run the job, the com= puted values for this function are updated. Transforms that change the numb= er of rows in subsequent recipe steps do not affect the values computed for= this step.

NOTE: This function applies to a sample of the entire p= opulation. More information is below.

Terms...
=20

Relevant terms:

=20
=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
TermDescription
PopulationPopulation statistical functions are computed fr= om all possible values. See https://en.wikip= edia.org/wiki/Statistical_population.
Sample=20
=20

Sample-based statistical functions are computed from a subset or sample = of all values. See https://en.wikipedia.org/w= iki/Sampling_(statistics).

=20

These function names include `SAMP` in their name.

=20
=20

NOTE: Statistical sampling has no relationship to the = samples taken within the product. When statistical functions are computed d= uring job execution, they are applied across the entire dataset. Sample met= hod calculations are computed at that time.

=20
=20
=20

• This function is calculated across a sample of all values.

• For more information on a population version of this function, see = ;VARIF Function.

Wrangle vs. SQL: This function is part of Wrangle , a proprietary data transformation language.= Wrangle is not SQL. For more information, se= e Wrangle Language.

## Basic Usage=

=20

varsampif(testScores, ((testScores > 0)= && (testScores < 90)))

Ou= tput: Returns the variance of the `testScores` column when the `testScores` value is between 0 and 90 using = the sample method of calculation.

## Sy= ntax and Arguments

=20

varsampif(col_ref, test_expre= ssion) [group:group_col_ref] [limit:limit_count]

Argument Required? Data Type Description
col_ref Y string Reference to the column you wish t= o evaluate.
test_expression Y string Expression that is evaluated. Must resolve to true or `false`

For more information on syntax standards, see Language Documentation Syntax Note= s.

For more information on the `group` and ```limit= ``` parameters, see Pivot Transform.

### col_ref

Name of the column whose values you wish to use in the calculation. Colu= mn must be a numeric (Integer or Decimal) type.

Usage Notes:

Required? Data Type Example Value
Yes String that corresponds to the name of the colum= n `myValues`

### test_expression

This parameter contains the expression to evaluate. This expression must= resolve to a Boolean (`true` or `false`) v= alue.

Usage Notes:

Required? Data Type Example Value
Yes String expression that evaluates to `true or false` ```(LastName =3D=3D 'Mouse' && FirstN= ame =3D=3D 'Mickey') ```

## Examples

### Exampl= e - Conditional Calculation Functions

This example shows some of the statistical functions that use the sample= method of computation. These include:

Source:

Students took tests on three consecutive Saturdays:

=20
Student Date Score
Andrew 11/9/19 81
Bella 11/9/19 84
Christina 11/9/19 79
David 11/9/19 64
Ellen 11/9/19 61
Fred 11/9/19 63
Andrew 11/16/19 73
Bella 11/16/19 88
Christina 11/16/19 78
David 11/16/19 67
Ellen 11/16/19 87
Fred 11/16/19 90
Andrew 11/23/19 76
Bella 11/23/19 93
Christina 11/23/19 81
David 11/23/19 97
Ellen 11/23/19 97
Fred 11/23/19 91

Transformation:

You can use the following transformations to calculate standard deviatio= n and variance across all dates using the sample method. Each computation h= as been rounded to three digits.

=20
=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name <= code>New formula `Single row formula` `round(stdevsamp(Score), 3)` `'stdevSamp'`
=20

=20
=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name <= code>New formula `Single row formula` `round(varsamp(Score), 3)` `'varSamp'`
=20

You can use the following to limit the previous statistical computations= to the last two Saturdays of testing:

=20
=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name <= code>New formula `Single row formula` ```round(stdevsampif(Score, Date !=3D '11\/9\= /2019'), 3)``` `'stdevSampIf'`
=20

=20
=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name <= code>New formula `Single row formula` ```round(varsampif(Score, Date !=3D '11\/9\/2= 019'), 3)``` `'varSampIf'`
=20

Results:

Student Date Score varSampIf stdevSampIf varSamp stdevSamp
Andrew 11/9/19 81 94.515 9.722 131.673 11.475
Bella 11/9/19 84 94.515 9.722 131.673 11.475
Christina 11/9/19 79 94.515 9.722 131.673 11.475
David 11/9/19 64 94.515 9.722 131.673 11.475
Ellen 11/9/19 61 94.515 9.722 131.673 11.475
Fred 11/9/19 63 94.515 9.722 131.673 11.475
Andrew 11/16/19 73 94.515 9.722 131.673 11.475
Bella 11/16/19 88 94.515 9.722 131.673 11.475
Christina 11/16/19 78 94.515 9.722 131.673 11.475
David 11/16/19 67 94.515 9.722 131.673 11.475
Ellen 11/16/19 87 94.515 9.722 131.673 11.475
Fred 11/16/19 90 94.515 9.722 131.673 11.475
Andrew 11/23/19 76 94.515 9.722 131.673 11.475
Bella 11/23/19 93 94.515 9.722 131.673 11.475
Christina 11/23/19 81 94.515 9.722 131.673 11.475
David 11/23/19 97 94.515 9.722 131.673 11.475
Ellen 11/23/19 97 94.515 9.722 131.673 11.475
Fred 11/23/19 91 94.515 9.722 131.673 11.475

=20
=20