Date: Tue, 3 Aug 2021 23:28:23 +0000 (GMT) Message-ID: <2880860.29151.1628033303331@6a789edf488b> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_29150_1379015747.1628033303331" ------=_Part_29150_1379015747.1628033303331 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html COVARSAMP Function

# COVARSAMP Function

Computes the covariance between two columns using the sampl= e method. Source values can be of Integer or Decimal type.

Covariance measures the joint variation between two set= s of values. The sign of the covariance tends to show the linear relationsh= ip between the two datasets; positive covariance indicates that the numbers= tend to increase with each other.

• The magnitude of the covariance is difficult to interpret, as it varies= with the size of the source values.
• The normalized version of covariance is the correlation coefficient, in= which covariance is normalized between -1 and 1. For more information, see=  CORREL Function.

Terms...
=20

Relevant terms:

=20
=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
TermDescription
PopulationPopulation statistical functions are computed fr= om all possible values. See https://en.wikip= edia.org/wiki/Statistical_population.
Sample=20
=20

Sample-based statistical functions are computed from a subset or sample = of all values. See https://en.wikipedia.org/w= iki/Sampling_(statistics).

=20

These function names include `SAMP` in their name.

=20
=20

NOTE: Statistical sampling has no relationship to the = samples taken within the product. When statistical functions are computed d= uring job execution, they are applied across the entire dataset. Sample met= hod calculations are computed at that time.

=20
=20
=20

Wrangle vs. SQL: This function is part of Wrangle , a proprietary data transformation language.= Wrangle is not SQL. For more information, se= e Wrangle Language.

## Basic Usage=

=20

covarsamp(squareFootage,purchasePrice)

Ou= tput: Returns the covariance using the sample method bet= ween the values in the ```squareFootage= ``` column and the ```pur= chasePrice``` column.

## Sy= ntax and Arguments

=20

covarsamp(function_col_ref1,function= _col_ref2) [group:group_col_ref] [limit:limit_count]

=20 =20 =20 =20 =20
Argument Required? Data Type Description
function_col_ref1 Y string Name of column that is the first input to = the function
function_col_ref2 Y string Name of column that is the second in= put to the function

For more information on the `group` and `limit` pa= rameters, see Pivot Transform= .

For more information on syntax standards, see Language Documentation Syntax Note= s.

### function_col_ref1, function_col_ref2

Name of the column the values of which you want to calculate the covaria= nce. Column must contain Integer or Decimal values.

• Literal values are not supported as inputs.
• Multiple columns and wildcards are not supported.

Usage Notes:

=20 =20 =20 =20
Required? Data Type Example Value
Yes String (column reference) `myInputs`

## Examples

This example illustrates the following two-column statistical fun= ctions:

• `CORREL` - Correlation co-efficient between two columns. See= CORREL Function.
• `COVAR` - Calculates the covariance between two columns= . See COVAR Function.
• `COVARSAMP` - Calculates the covariance between two col= umns using the sample population method. See COVARSAMP Function.

Source:

The following table contains height in inches and weight in pounds for a= set of students.

Student heightIn weightLbs
1 70 134
2 67 135
3 67 147
4 67 160
5 72 136
6 73 146
7 71 135
8 63 145
9 67 138
10 66 138
11 71 161
12 70 131
13 74 131
14 67 157
15 73 161
16 70 133
17 63 132
18 64 153
19 64 156
20 72 154

Transformation:

You can use the following transformations to calculate the correlation c= o-efficient, the covariance, and the sampling method covariance between the= two data columns:

=20
=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name <= code>New formula `Single row formula` `round(correl(heightIn, weightLbs), 3)` `'corrHeightAndWeight'`
=20

=20
=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name <= code>New formula `Single row formula` `round(covar(heightIn, weightLbs), 3)` `'covarHeightAndWeight'`
=20

=20
=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 = =20 =20
Transformation Name <= code>New formula `Single row formula` `round(covarsamp(heightIn, weightLbs), 3)` `'covarHeightAndWeight-Sample'`
=20

Results:

Student heightIn weightLbs covarHeightAndWeight-Sample covarHeightAndWeight<= /th> corrHeightAndWeight
1 70 134 -2.876 -2.732 -0.074
2 67 135 -2.876 -2.732 -0.074
3 67 147 -2.876 -2.732 -0.074
4 67 160 -2.876 -2.732 -0.074
5 72 136 -2.876 -2.732 -0.074
6 73 146 -2.876 -2.732 -0.074
7 71 135 -2.876 -2.732 -0.074
8 63 145 -2.876 -2.732 -0.074
9 67 138 -2.876 -2.732 -0.074
10 66 138 -2.876 -2.732 -0.074
11 71 161 -2.876 -2.732 -0.074
12 70 131 -2.876 -2.732 -0.074
13 74 131 -2.876 -2.732 -0.074
14 67 157 -2.876 -2.732 -0.074
15 73 161 -2.876 -2.732 -0.074
16 70 133 -2.876 -2.732 -0.074
17 63 132 -2.876 -2.732 -0.074
18 64 153 -2.876 -2.732 -0.074
19 64 156 -2.876 -2.732 -0.074
20 72 154 -2.876 -2.732 -0.074
=20

=20