Contents:
This transform does not reference keys in the array. If your array data contains keys, use the unnest
transform. See Unnest Transform.
Basic Usage
flatten col: myArray
Output: Generates a separate row for each value in the array. Values of other columns in generated rows are copied from the source.
Syntax and Parameters
flatten: col: column_ref
Token | Required? | Data Type | Description |
---|---|---|---|
flatten | Y | transform | Name of the transform |
col | Y | string | Source column name |
For more information on syntax standards, see Language Documentation Syntax Notes.
col
Identifies the column to which to apply the transform. You can specify only one column.
Usage Notes:
Required? | Data Type |
---|---|
Yes | String (column name) |
Tip: For additional examples, see Common Tasks.
Examples
Example - Flatten an array
In this example, the source data includes an array of scores that need to broken out into separate rows.
Source:
LastName | FirstName | Scores |
---|---|---|
Adams | Allen | [81,87,83,79] |
Burns | Bonnie | [98,94,92,85] |
Cannon | Chris | [88,81,85,78] |
Transform:
When the data is imported, you might have to re-type the Scores
column as an array:
settype col: Scores type: 'Array'
You can now flatten the Scores
column data into separate rows:
flatten col: Scores
Results:
LastName | FirstName | Scores |
---|---|---|
Adams | Allen | 81 |
Adams | Allen | 87 |
Adams | Allen | 83 |
Adams | Allen | 79 |
Burns | Bonnie | 98 |
Burns | Bonnie | 94 |
Burns | Bonnie | 92 |
Burns | Bonnie | 85 |
Cannon | Chris | 88 |
Cannon | Chris | 81 |
Cannon | Chris | 85 |
Cannon | Chris | 78 |
This example is extended below.
Example - Flatten and unnest together
While the above example nicely flattens out your data, there are two potential problems with the results:
- There is no identifier for each test. For example, Allen Adams' score of 87 cannot be associated with the specific test on which he recorded the score.
- There is no unique identifier for each row.
The following example addresses both of these issues. It also demonstrates differences between the unnest
and the flatten
transform, including how you use unnest
to flatten array data based on specified keys.
- For more information, see Unnest Transform.
You have the following data on student test scores. Scores on individual scores are stored in the Scores
array, and you need to be able to track each test on a uniquely identifiable row. This example has two goals:
- One row for each student test
- Unique identifier for each student-score combination
LastName | FirstName | Scores |
---|---|---|
Adams | Allen | [81,87,83,79] |
Burns | Bonnie | [98,94,92,85] |
Cannon | Charles | [88,81,85,78] |
Transform:
When the data is imported from CSV format, you must add a header
transform and remove the quotes from the Scores
column:
header
replace col:Scores with:'' on:`"` global:true
Scores
array (4) and the actual number:
derive type:single value: (4 - ARRAYLEN(Scores)) as: 'numMissingTests'
Unique row identifier: The Scores
array must be broken out into individual rows for each test. However, there is no unique identifier for the row to track individual tests. In theory, you could use the combination of LastName-FirstName-Scores
values to do so, but if a student recorded the same score twice, your dataset has duplicate rows. In the following transform, you create a parallel array called Tests
, which contains an index array for the number of values in the Scores
column. Index values start at 0
:
derive type:single value:RANGE(0,ARRAYLEN(Scores)) as:'Tests'
SOURCEROWNUMBER
function:
derive type:single value:SOURCEROWNUMBER() as:'orderIndex'
LastName | FirstName | Scores | Tests | orderIndex |
---|---|---|---|---|
Adams | Allen | [81,87,83,79] | [0,1,2,3] | 2 |
Burns | Bonnie | [98,94,92,85] | [0,1,2,3] | 3 |
Cannon | Charles | [88,81,85,78] | [0,1,2,3] | 4 |
Now, you want to bring together the Tests
and Scores
arrays into a single nested array using the ARRAYZIP
function:
derive type:single value:ARRAYZIP([Tests,Scores])
LastName | FirstName | Scores | Tests | orderIndex | column1 |
---|---|---|---|---|---|
Adams | Allen | [81,87,83,79] | [0,1,2,3] | 2 | [[0,81],[1,87],[2,83],[3,79]] |
Adams | Bonnie | [98,94,92,85] | [0,1,2,3] | 3 | [[0,98],[1,94],[2,92],[3,85]] |
Cannon | Charles | [88,81,85,78] | [0,1,2,3] | 4 | [[0,88],[1,81],[2,85],[3,78]] |
With the flatten
transform, you can unpack the nested array:
flatten col: column1
unnest
:
unnest col:column1 keys:'[0]','[1]'
column1
, which is no longer needed you should rename the two generated columns:
rename mapping:[column_0,'TestNum']
rename mapping:[column_1,'TestScore']
OrderIndex
as an identifier for the student and the TestNumber
value to create the TestId
column value:
derive type:single value: (orderIndex * 10) + TestNum as: 'TestId'
merge col:'TestId00','TestId'
Extending: You might want to generate some summary statistical information on this dataset. For example, you might be interested in calculating each student's average test score. This step requires figuring out how to properly group the test values. In this case, you cannot group by the
LastName
value, and when executed at scale, there might be collisions between first names when this recipe is run at scale. So, you might need to create a kind of primary key using the following:
merge col:'LastName','FirstName' with:'-' as:'studentId'
derive type:single value:AVERAGE(TestScore) group:studentId as:'avg_TestScore'
Results:
After you drop unnecessary columns and move your columns around, the dataset should look like the following:
TestId | LastName | FirstName | TestNum | TestScore | studentId | avg_TestScore |
---|---|---|---|---|---|---|
TestId0021 | Adams | Allen | 0 | 81 | Adams-Allen | 82.5 |
TestId0022 | Adams | Allen | 1 | 87 | Adams-Allen | 82.5 |
TestId0023 | Adams | Allen | 2 | 83 | Adams-Allen | 82.5 |
TestId0024 | Adams | Allen | 3 | 79 | Adams-Allen | 82.5 |
TestId0031 | Adams | Bonnie | 0 | 98 | Adams-Bonnie | 92.25 |
TestId0032 | Adams | Bonnie | 1 | 94 | Adams-Bonnie | 92.25 |
TestId0033 | Adams | Bonnie | 2 | 92 | Adams-Bonnie | 92.25 |
TestId0034 | Adams | Bonnie | 3 | 85 | Adams-Bonnie | 92.25 |
TestId0041 | Cannon | Chris | 0 | 88 | Cannon-Chris | 83 |
TestId0042 | Cannon | Chris | 1 | 81 | Cannon-Chris | 83 |
TestId0043 | Cannon | Chris | 2 | 85 | Cannon-Chris | 83 |
TestId0044 | Cannon | Chris | 3 | 78 | Cannon-Chris | 83 |
This page has no comments.