EXAMPLE - Nest JSON Records
This section illustrates a simple example of how to nest tabular data into JSON records.
Source:
The following source data is a mixture of homogeneous (Score1, Score2, Score3) and heterogeneous (Height-Ins, Weight-Lbs, ShoeSize-US) columns for each row (Student).
Student | Score1 | Score2 | Score3 | Height-Ins | Weight-Lbs | ShoeSize-US |
---|---|---|---|---|---|---|
StudentA | 84 | 89 | 92 | 68 | 140 | 8 |
StudentB | 98 | 94 | 96 | 63 | 105 | 6 |
StudentC | 81 | 83 | 82 | 65 | 114 | 7 |
Transformation:
You can use the following transformations to assemble the above data into a JSON object.
In this first transformation, you nest the test scores columns. Since this data has the same meaning across all three columns (homogeneous), an Array is the proper storage object:
Transformation Name | |
---|---|
Parameter: Columns1 | Score1 |
Parameter: Columns2 | Score2 |
Parameter: Columns3 | Score3 |
Parameter: Nest columns to | Array |
Parameter: New column name | scores |
Then, you can delete the three source columns, yielding the following:
Student | scores | Height-Ins | Weight-Lbs | ShoeSize-US |
---|---|---|---|---|
StudentA | ["84","89","92"] | 68 | 140 | 8 |
StudentB | ["98","94","96"] | 63 | 105 | 6 |
StudentC | ["81","83","82"] | 65 | 114 | 7 |
Next, you can nest the other three data columns. Since the data has different meanings in each column, you should carry forward the column names as keys in the nested data, which should be an Object type:
Transformation Name | |
---|---|
Parameter: Columns | {Height-Ins},{Weight-Lbs},{ShoeSize-Us} |
Parameter: Nest columns to | Object |
Parameter: New column name | 'measurements' |
After you delete the source columns, you should have the following:
Student | scores | measurements |
---|---|---|
StudentA | ["84","89","92"] | {"Height-Ins":"68","Weight-Lbs":"140","ShoeSize-US":"8"} |
StudentB | ["98","94","96"] | {"Height-Ins":"63","Weight-Lbs":"105","ShoeSize-US":"6"} |
StudentC | ["81","83","82"] | {"Height-Ins":"65","Weight-Lbs":"114","ShoeSize-US":"7"} |
Nota
You may have noticed that the column names for the nested columns start with a lower-case letter, which follows JSON formatting standards. The Student
column should be renamed to student
.
Now that you have the detail columns nested into an array and an object, you can nest these columns into an object for each student, as in the following transformation:
Transformation Name | |
---|---|
Parameter: Columns1 | student |
Parameter: Columns2 | scores |
Parameter: Columns3 | measurement |
Parameter: Nest columns to | Object |
Parameter: New column name | column1 |
After you delete the three source columns, you can rename column1
to student
.
Results:
student |
---|
{"student":"StudentA","scores":["84","89","92"],"measurements":{"ShoeSize-Us":"8","Weight-Lbs":"140","Height-Ins":"68"}} |
{"student":"StudentB","scores":["98","94","96"],"measurements":{"ShoeSize-Us":"6","Weight-Lbs":"105","Height-Ins":"63"}} |
{"student":"StudentC","scores":["81","83","82"],"measurements":{"ShoeSize-Us":"7","Weight-Lbs":"114","Height-Ins":"65"}} |
When you run the job, generate a JSON output to produce the above in JSONLines format.