Date: Tue, 24 Nov 2020 21:26:41 +0000 (GMT) Message-ID: <1801038660.32123.1606253201362@df68ed866f50> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_32122_732121093.1606253201361" ------=_Part_32122_732121093.1606253201361 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html CORREL Function

# CORREL Function

Computes the correlation coefficient between two columns. S= ource values can be of Integer or Decimal type.

The correlation coefficient measures the relations= hip between two sets of values. You can use it as a measurement for how cha= nges in one value affect changes in the other.

• Values range between -1 (negative correlation) and +1 (positive correla= tion).
• Negative correlation means that the second number tends to decrease whe= n the first number increases.
• Positive correlation means that the second number tends to increase whe= n the first number increases.
• A correlation coefficient that is close to 0 indicates a weak or non-ex= istent correlation.

Terms...
=20

Relevant terms:

=20
=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
TermDescription
PopulationPopulation statistical functions are computed fr= om all possible values. See https://en.wikip= edia.org/wiki/Statistical_population.
Sample=20
=20

Sample-based statistical functions are computed from a subset or sample = of all values. See https://en.wikipedia.org/w= iki/Sampling_(statistics).

=20

These function names include `SAMP` in their name.

=20
=20

NOTE: Statistical sampling has no relationship to the = samples taken within the product. When statistical functions are computed d= uring job execution, they are applied across the entire dataset. Sample met= hod calculations are computed at that time.

=20
=20
=20

Wrangle vs. SQL: This function is part of Wrangle , a proprietary data transformation language.= Wrangle is not SQL. For more information, se= e Wrangle Language.

## Basic Usage

=20

correl(initialInvestment,ROI)

Ou= tput: Returns the correlation coefficient between the va= lues in the `initialInvestment`=  column and the `ROI`= column.

## Synta= x and Arguments

=20

correl(function_col_ref1,function_co= l_ref2) [group:group_col_ref] [limit:limit_count]

=20 =20 =20 =20 =20
Argument Required? Data Type Description
function_col_ref1 Y string Name of column that is the first input to = the function
function_col_ref2 Y string Name of column that is the second in= put to the function

For more information on the `group` and `limit` pa= rameters, see Pivot Transform.<= /p>

For more information on syntax standards, see Language Documentation Syntax Notes<= /a>.

### function_col_ref1, function_col_ref2

Name of the column the values of which you want to calculate the correla= tion. Column must contain Integer or Decimal values.

• Literal values are not supported as inputs.
• Multiple columns and wildcards are not supported.

Usage Notes:

=20 =20 =20 =20
Required? Data Type Example Value
Yes String (column reference) `myInputs`

## Examples

This example illustrates the following two-column statistical fun= ctions:

• `CORREL` - Correlation co-efficient between two columns. See= CORREL Function.
• `COVAR` - Calculates the covariance between two columns= . See COVAR Function.
• `COVARSAMP` - Calculates the covariance between two col= umns using the sample population method. See COVARSAMP Function.

Source:

The following table contains height in inches and weight in pounds for a= set of students.

Student heightIn weightLbs
1 70 134
2 67 135
3 67 147
4 67 160
5 72 136
6 73 146
7 71 135
8 63 145
9 67 138
10 66 138
11 71 161
12 70 131
13 74 131
14 67 157
15 73 161
16 70 133
17 63 132
18 64 153
19 64 156
20 72 154

Transformation:

You can use the following transformations to calculate the correlation c= o-efficient, the covariance, and the sampling method covariance between the= two data columns:

=20
=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name <= code>New formula `Single row formula` `round(correl(heightIn, weightLbs), 3)` `'corrHeightAndWeight'`
=20

=20
=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
Transformation Name <= code>New formula `Single row formula` `round(covar(heightIn, weightLbs), 3)` `'covarHeightAndWeight'`
=20

=20
=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 = =20 =20
Transformation Name <= code>New formula `Single row formula` `round(covarsamp(heightIn, weightLbs), 3)` `'covarHeightAndWeight-Sample'`
=20

Results:

Student heightIn weightLbs covarHeightAndWeight-Sample covarHeightAndWeight<= /th> corrHeightAndWeight
1 70 134 -2.876 -2.732 -0.074
2 67 135 -2.876 -2.732 -0.074
3 67 147 -2.876 -2.732 -0.074
4 67 160 -2.876 -2.732 -0.074
5 72 136 -2.876 -2.732 -0.074
6 73 146 -2.876 -2.732 -0.074
7 71 135 -2.876 -2.732 -0.074
8 63 145 -2.876 -2.732 -0.074
9 67 138 -2.876 -2.732 -0.074
10 66 138 -2.876 -2.732 -0.074
11 71 161 -2.876 -2.732 -0.074
12 70 131 -2.876 -2.732 -0.074
13 74 131 -2.876 -2.732 -0.074
14 67 157 -2.876 -2.732 -0.074
15 73 161 -2.876 -2.732 -0.074
16 70 133 -2.876 -2.732 -0.074
17 63 132 -2.876 -2.732 -0.074
18 64 153 -2.876 -2.732 -0.074
19 64 156 -2.876 -2.732 -0.074
20 72 154 -2.876 -2.732 -0.074
=20

=20