Skip to main content

EXAMPLE - Nest JSON Records

This section illustrates a simple example of how to nest tabular data into JSON records.

Source:

The following source data is a mixture of homogeneous (Score1, Score2, Score3) and heterogeneous (Height-Ins, Weight-Lbs, ShoeSize-US) columns for each row (Student).

Student

Score1

Score2

Score3

Height-Ins

Weight-Lbs

ShoeSize-US

StudentA

84

89

92

68

140

8

StudentB

98

94

96

63

105

6

StudentC

81

83

82

65

114

7

Transformation:

You can use the following transformations to assemble the above data into a JSON object.

In this first transformation, you nest the test scores columns. Since this data has the same meaning across all three columns (homogeneous), an Array is the proper storage object:

Transformation Name

Nest columns into Objects

Parameter: Columns1

Score1

Parameter: Columns2

Score2

Parameter: Columns3

Score3

Parameter: Nest columns to

Array

Parameter: New column name

scores

Then, you can delete the three source columns, yielding the following:

Student

scores

Height-Ins

Weight-Lbs

ShoeSize-US

StudentA

["84","89","92"]

68

140

8

StudentB

["98","94","96"]

63

105

6

StudentC

["81","83","82"]

65

114

7

Next, you can nest the other three data columns. Since the data has different meanings in each column, you should carry forward the column names as keys in the nested data, which should be an Object type:

Transformation Name

Nest columns into Objects

Parameter: Columns

{Height-Ins},{Weight-Lbs},{ShoeSize-Us}

Parameter: Nest columns to

Object

Parameter: New column name

'measurements'

After you delete the source columns, you should have the following:

Student

scores

measurements

StudentA

["84","89","92"]

{"Height-Ins":"68","Weight-Lbs":"140","ShoeSize-US":"8"}

StudentB

["98","94","96"]

{"Height-Ins":"63","Weight-Lbs":"105","ShoeSize-US":"6"}

StudentC

["81","83","82"]

{"Height-Ins":"65","Weight-Lbs":"114","ShoeSize-US":"7"}

Note

You may have noticed that the column names for the nested columns start with a lower-case letter, which follows JSON formatting standards. The Student column should be renamed to student.

Now that you have the detail columns nested into an array and an object, you can nest these columns into an object for each student, as in the following transformation:

Transformation Name

Nest columns into Objects

Parameter: Columns1

student

Parameter: Columns2

scores

Parameter: Columns3

measurement

Parameter: Nest columns to

Object

Parameter: New column name

column1

After you delete the three source columns, you can rename column1 to student.

Results:

student

{"student":"StudentA","scores":["84","89","92"],"measurements":{"ShoeSize-Us":"8","Weight-Lbs":"140","Height-Ins":"68"}}

{"student":"StudentB","scores":["98","94","96"],"measurements":{"ShoeSize-Us":"6","Weight-Lbs":"105","Height-Ins":"63"}}

{"student":"StudentC","scores":["81","83","82"],"measurements":{"ShoeSize-Us":"7","Weight-Lbs":"114","Height-Ins":"65"}}

When you run the job, generate a JSON output to produce the above in JSONLines format.