Pivot Transform
Note
Transforms are a part of the underlying language, which is not directly accessible to users. This content is maintained for reference purposes only. For more information on the user-accessible equivalent to transforms, see Transformation Reference.
When you pivot data, the values from the selected column or columns become new columns in the dataset, each of which contains a summary calculation that you specify.
This calculation can be based on all rows for totals across the dataset or based on group of rows you define in the transform.
When your data has been pivoted, the unused columns are dropped, leaving only the operative data for faster evaluation.
Transform accepts one or more columns as inputs to the pivot.
Tip
This transform is very useful in Preview mode for quick discovery and analysis.
Basic Usage
pivot col: Dates value:SUM(Sales) group: prodId
Output: Reshapes your dataset to include a ProdId
column and new columns for each distinct value in the Dates
column. Each distinct ProdId
value is represented by a separate row in the reshaped dataset. Within each row, the new columns contain a sum of sales for the product for each date in the dataset.
Syntax and Parameters
pivot col:column_ref value: FUNCTION(arg1,arg2) [group: group_col] [limit:int_num]
Token | Required? | Data Type | Description |
---|---|---|---|
pivot | Y | transform | Name of the transform |
col | N | string | Source column name or names |
value | N | string | Expression containing the aggregation function or functions used to pivot the source column data |
group | N | string | Column name or names containing the values by which to group for calculation |
limit | N | integer (positive) | Maximum number of unique values in a group. Default is 50. |
For more information on syntax standards, see Language Documentation Syntax Notes.
Identifies the column or columns to which to apply the transform.
Single-column example:
The specified column contains the values that become new columns in the dataset.
pivot col: autoBrand value:AVERAGE(autoPrice) group:State
Output: Reshapes the dataset to contain the State
column followed by columns for each distinct value in the autoBrand
column. Each value in those columns is the average value in the autoPrice
column, as grouped by State
value.
Multi-column example:
For multiple input columns, the transform generates new columns for each combination of the inputs. See the example below.
Usage Notes:
Required? | Data Type |
---|---|
No | String (column name) |
For this transform, the value
parameter contains the aggregation function and its parameters, which define the set of rows to which the function is applied.
Note
For the value
parameter, you can only use aggregation functions. Nested functions are not supported. For more information, see Aggregate Functions.
Usage Notes:
Required? | Data Type |
---|---|
No | String containing a list of aggregation functions each containing one column reference |
For this transform, this parameter specifies the column whose values are used to group the dataset prior to applying the specified function. You can specify multiple columns as comma-separated values.
Note
Transforms that use the group
parameter can result in non-deterministic re-ordering in the data grid. However, you should apply the group
parameter, particularly on larger datasets, or your job may run out of memory and fail. To enforce row ordering, you can use the sort
transform. For more information, see Sort Transform.
Usage Notes:
Required? | Data Type |
---|---|
No | String (column name) |
The limit
parameter defines the maximum number of unique values permitted in a group. If it is not specified, the default value for this parameter is 50
.
Note
Be careful setting this parameter too high. In some cases, the application can run out of memory generating the results, and your results can fail.
Usage Notes:
Required? | Data Type |
---|---|
No. Default value is | Integer (positive) |
Examples
Tip
For additional examples, see Common Tasks.
Source:
The following dataset contains information about sales across one weekend and two states for three different products.
Date | State | ProdId | Sales |
---|---|---|---|
3/9/16 | CA | Big Trike 9000 | 500 |
3/9/16 | NV | Fast GoKart 5000 | 200 |
3/9/16 | CA | SuperQuick Scooter | 700 |
3/9/16 | CA | Fast GoKart 5000 | 900 |
3/9/16 | NV | Big Trike 9000 | 300 |
3/9/16 | NV | SuperQuick Scooter | 250 |
3/10/16 | NV | Fast GoKart 5000 | 50 |
3/10/16 | NV | Big Trike 9000 | 400 |
3/10/16 | NV | SuperQuick Scooter | 150 |
3/10/16 | CA | Big Trike 9000 | 600 |
3/10/16 | CA | SuperQuick Scooter | 800 |
3/10/16 | CA | Fast GoKart 5000 | 1100 |
Transformation 1: Sum of sales by State for each Date
Apply this transformation in Preview only, just so you can see the results:
Transformation Name |
|
---|---|
Parameter: Column labels | Date |
Parameter: Row labels | State |
Parameter: Values | sum(Sales) |
Cancel the transformation update.
State | sum_Sales_03/09/2016 | sum_Sales_03/10/2016 |
---|---|---|
CA | 2100 | 2500 |
NV | 750 | 600 |
Transform 2: Sum of Sales by Date for each State
Transformation Name |
|
---|---|
Parameter: Column labels | State |
Parameter: Row labels | Date |
Parameter: Values | sum(Sales) |
Date | sum_Sales_CA | sum_Sales_NV |
---|---|---|
03/09/2016 | 2100 | 750 |
03/10/2016 | 2500 | 600 |
Cancel the transformation update again.
Transform 3: Sum of Sales by product ID for each State
Transformation Name |
|
---|---|
Parameter: Column labels | State |
Parameter: Row labels | ProdId |
Parameter: Values | sum(Sales) |
ProdId | sum_Sales_CA | sum_Sales_NV |
---|---|---|
Big Trike 9000 | 1100 | 700 |
Fast GoKart 5000 | 2000 | 500 |
SuperQuick Scooter | 1500 | 300 |
The Pivot Columns transformation supports the ability to use the values in multiple columns to specify the columns that are generated. When two or more columns are used in the pivot, columns containing values of all possible combinations from the source columns are generated.
In the following source data, sales data on individual products is organized by brand, product name, and month:
Brand | Product | Month | Sales |
---|---|---|---|
AAA | Towels | January | 95 |
AAA | Napkins | January | 113 |
B | Towels | January | 99 |
B | Tissues | January | 88 |
AAA | Towels | February | 108 |
AAA | Napkins | February | 91 |
B | Towels | February | 85 |
B | Tissues | February | 105 |
AAA | Towels | March | 81 |
AAA | Napkins | March | 92 |
B | Towels | March | 112 |
B | Tissues | March | 104 |
Transformation:
If you wanted to create summary sales information for each product by month, you might choose to create a pivot
on the Product
column. However, if you look at the column values, you might notice that the Paper towels
product is available for both the AAA
and B
brands. In this case, you must perform a multi-column pivot on these two columns, like the following:
Transformation Name |
|
---|---|
Parameter: Column labels | Brand,Product |
Parameter: Row labels | Month |
Parameter: Values | average(Sales) |
Results:
Month | avg_Sales_B_Towels | avg_Sales_B_Tissues | avg_Sales_AAA_Napkins | avg_Sales_AAA_Towels |
---|---|---|---|---|
January | 99 | 88 | 113 | 95 |
February | 85 | 105 | 91 | 108 |
March | 112 | 104 | 92 | 81 |
You can use the pivot
transform to perform aggregation calculations on values in a column.
Source:
In the following table, you can review test results from three different tests for 10 students:
StudentId | TestId | Score |
---|---|---|
s001 | t001 | 98 |
s001 | t002 | 98 |
s001 | t003 | 87 |
s002 | t001 | 92 |
s002 | t002 | 96 |
s002 | t003 | 79 |
s003 | t001 | 99 |
s003 | t002 | 76 |
s003 | t003 | 94 |
s004 | t001 | 93 |
s004 | t002 | 99 |
s004 | t003 | 80 |
s005 | t001 | 79 |
s005 | t002 | 80 |
s005 | t003 | 84 |
s006 | t001 | 93 |
s006 | t002 | 74 |
s006 | t003 | 89 |
s007 | t001 | 86 |
s007 | t002 | 81 |
s007 | t003 | 97 |
s008 | t001 | 82 |
s008 | t002 | 73 |
s008 | t003 | 79 |
s009 | t001 | 96 |
s009 | t002 | 97 |
s009 | t003 | 79 |
s010 | t001 | 80 |
s010 | t002 | 79 |
s010 | t003 | 75 |
Transformation:
For each student, you're interested in a student's average score and maximum score. You create the following pivot transform, which groups computations of average score and maximum score columns by studentId
value.
Transformation Name |
|
---|---|
Parameter: Row labels | StudentId |
Parameter: Values | average(Score),max(Score) |
Parameter: Max number of columns to create | 50 |
For the generated average column, you want results rounded to two decimal places:
Transformation Name |
|
---|---|
Parameter: Columns | average_Score |
Parameter: Formula | round(average_Score, 2) |
Results:
StudentId | average_Score | max_Score |
---|---|---|
s001 | 94.33 | 98 |
s002 | 89 | 96 |
s003 | 89.67 | 99 |
s004 | 90.67 | 99 |
s005 | 81 | 84 |
s006 | 85.33 | 93 |
s007 | 88 | 97 |
s008 | 78 | 82 |
s009 | 90.67 | 97 |
s010 | 78 | 80 |