Contents:
If the arrays are of different length, then null values are inserted for combinations where one array is missing a corresponding value.
Basic Usage
Array literal reference example:
derive type:single value:ARRAYZIP([["A","B","C"],["1","2","3"]] )
Output: Generates a nested array combining elements from the two source arrays.
Column reference example:
derive type:single value:ARRAYZIP([array1,array2]) as:'zippedArray'
Output: Generates a new zippedArray
column containing a single nested array pairing the elements of the array in the listed order of the arrays .
Syntax and Arguments
derive type:single value:ARRAYZIP(array_ref1,array_ref2)
Argument | Required? | Data Type | Description |
---|---|---|---|
array_ref1 | Y | string or array | Name of first column or first array literal to apply to the function |
array_ref2 | Y | string or array | Name of second column or second array literal to apply to the function |
For more information on syntax standards, see Language Documentation Syntax Notes.
array_ref1, array_ref2
Array literal or name of the array column whose elements you want to combine together.
Usage Notes:
Required? | Data Type | Example Value |
---|---|---|
Yes | Array literal or column reference | myArray1 , myArray2 |
Tip: For additional examples, see Common Tasks.
Examples
Example - Simple ARRAYZIP example
Source:
Item | Letters | Numerals |
---|---|---|
Item1 | ["A","B","C"] | ["1","2","3"] |
Item2 | ["D","E","F"] | ["4","5","6"] |
Item3 | ["G","H","I"] | ["7","8","9"] |
Transform:
derive type:single value:ARRAYZIP([Letters,Numerals]) as:'LettersAndNumerals'
Results:
Item | Letters | Numerals | LettersAndNumerals |
---|---|---|---|
Item1 | ["A","B","C"] | ["1","2","3"] | [["A","1"],["B",2"],["C","3"]] |
Item2 | ["D","E","F"] | ["4","5","6"] | [["F","4"],["G",5"],["H","6"]] |
Item3 | ["G","H","I"] | ["7","8","9"] | [["G","7"],["H",8"],["I","9"]] |
Example - Unnest an array
Source: You have the following data on student test scores. Scores on individual scores are stored in the Transformation: When the data is imported from CSV format, you must add a Validate test date: To begin, you might want to check to see if you have the proper number of test scores for each student. You can use the following transform to calculate the difference between the expected number of elements in the When the transform is previewed, you can see in the sample dataset that all tests are included. You might or might not want to include this column in the final dataset, as you might identify missing tests when the recipe is run at scale. Unique row identifier: The Also, we will want to create an identifier for the source row using the One row for each student test: Your data should look like the following: Now, you want to bring together the Your dataset has been changed: Use the following to unpack the nested array: Each test-score combination is now broken out into a separate row. The nested Test-Score combinations must be broken out into separate columns using the following: After you delete Unique row identifier: You can do one more step to create unique test identifiers, which identify the specific test for each student. The following uses the original row identifier The above are integer values. To make your identifiers look prettier, you might add the following: Extending: You might want to generate some summary statistical information on this dataset. For example, you might be interested in calculating each student's average test score. This step requires figuring out how to properly group the test values. In this case, you cannot group by the You can now use this as a grouping parameter for your calculation: Results: After you delete unnecessary columns and move your columns around, the dataset should look like the following: Scores
array, and you need to be able to track each test on a uniquely identifiable row. This example has two goals:LastName FirstName Scores Adams Allen [81,87,83,79] Burns Bonnie [98,94,92,85] Cannon Charles [88,81,85,78] header
transform and remove the quotes from the Scores
column:
Transformation Name
Rename column with row(s)
Parameter: Option
Use row(s) as column names
Parameter: Type
Use a single row to name columns
Parameter: Row number
1
Transformation Name
Replace text or pattern
Parameter: Column
colScores
Parameter: Find
'\"'
Parameter: Replace with
''
Parameter: Match all occurrences
true
Scores
array (4) and the actual number:
Transformation Name
New formula
Parameter: Formula type
Single row formula
Parameter: Formula
(4 - arraylen(Scores))
Parameter: New column name
'numMissingTests'
Scores
array must be broken out into individual rows for each test. However, there is no unique identifier for the row to track individual tests. In theory, you could use the combination of LastName-FirstName-Scores
values to do so, but if a student recorded the same score twice, your dataset has duplicate rows. In the following transform, you create a parallel array called Tests
, which contains an index array for the number of values in the Scores
column. Index values start at 0
:
Transformation Name
New formula
Parameter: Formula type
Single row formula
Parameter: Formula
range(0,arraylen(Scores))
Parameter: New column name
'Tests'
sourcerownumber
function:
Transformation Name
New formula
Parameter: Formula type
Single row formula
Parameter: Formula
sourcerownumber()
Parameter: New column name
'orderIndex'
LastName FirstName Scores Tests orderIndex Adams Allen [81,87,83,79] [0,1,2,3] 2 Burns Bonnie [98,94,92,85] [0,1,2,3] 3 Cannon Charles [88,81,85,78] [0,1,2,3] 4 Tests
and Scores
arrays into a single nested array using the arrayzip
function:
Transformation Name
New formula
Parameter: Formula type
Single row formula
Parameter: Formula
arrayzip([Tests,Scores])
LastName FirstName Scores Tests orderIndex column1 Adams Allen [81,87,83,79] [0,1,2,3] 2 [[0,81],[1,87],[2,83],[3,79]] Adams Bonnie [98,94,92,85] [0,1,2,3] 3 [[0,98],[1,94],[2,92],[3,85]] Cannon Charles [88,81,85,78] [0,1,2,3] 4 [[0,88],[1,81],[2,85],[3,78]]
Transformation Name
Expand arrays to rows
Parameter: Column
column1
Transformation Name
Unnest Objects into columns
Parameter: Column
column1
Parameter: Paths to elements
'[0]','[1]'
column1
, which is no longer needed you should rename the two generated columns:
Transformation Name
Rename columns
Parameter: Option
Manual rename
Parameter: Column
column_0
Parameter: New column name
'TestNum'
Transformation Name
Rename columns
Parameter: Option
Manual rename
Parameter: Column
column_1
Parameter: New column name
'TestScore'
OrderIndex
as an identifier for the student and the TestNumber
value to create the TestId
column value:
Transformation Name
New formula
Parameter: Formula type
Single row formula
Parameter: Formula
(orderIndex * 10) + TestNum
Parameter: New column name
'TestId'
Transformation Name
Merge columns
Parameter: Columns
'TestId00','TestId'
LastName
value, and when executed at scale, there might be collisions between first names when this recipe is run at scale. So, you might need to create a kind of primary key using the following:
Transformation Name
Merge columns
Parameter: Columns
'LastName','FirstName'
Parameter: Separator
'-'
Parameter: New column name
'studentId'
Transformation Name
New formula
Parameter: Formula type
Single row formula
Parameter: Formula
average(TestScore)
Parameter: Group rows by
studentId
Parameter: New column name
'avg_TestScore'
TestId LastName FirstName TestNum TestScore studentId avg_TestScore TestId0021 Adams Allen 0 81 Adams-Allen 82.5 TestId0022 Adams Allen 1 87 Adams-Allen 82.5 TestId0023 Adams Allen 2 83 Adams-Allen 82.5 TestId0024 Adams Allen 3 79 Adams-Allen 82.5 TestId0031 Adams Bonnie 0 98 Adams-Bonnie 92.25 TestId0032 Adams Bonnie 1 94 Adams-Bonnie 92.25 TestId0033 Adams Bonnie 2 92 Adams-Bonnie 92.25 TestId0034 Adams Bonnie 3 85 Adams-Bonnie 92.25 TestId0041 Cannon Chris 0 88 Cannon-Chris 83 TestId0042 Cannon Chris 1 81 Cannon-Chris 83 TestId0043 Cannon Chris 2 85 Cannon-Chris 83 TestId0044 Cannon Chris 3 78 Cannon-Chris 83
This page has no comments.