Page tree

Trifacta SaaS



Contents:

   

This section illustrates a simple example of how to nest tabular data into JSON records.

Source:

The following source data is a mixture of homogeneous (Score1, Score2, Score3) and heterogeneous (Height-Ins, Weight-Lbs, ShoeSize-US) columns for each row (Student).

StudentScore1Score2Score3Height-InsWeight-LbsShoeSize-US
StudentA848992681408
StudentB989496631056
StudentC818382651147

Transformation: 

You can use the following transformations to assemble the above data into a JSON object. 

In this first transformation, you nest the test scores columns. Since this data has the same meaning across all three columns (homogeneous), an Array is the proper storage object:

Transformation Name Nest columns into Objects
Parameter: Columns1 Score1
Parameter: Columns2 Score2
Parameter: Columns3 Score3
Parameter: Nest columns to Array
Parameter: New column name scores

Then, you can delete the three source columns, yielding the following:

StudentscoresHeight-InsWeight-LbsShoeSize-US
StudentA["84","89","92"]681408
StudentB["98","94","96"]631056
StudentC["81","83","82"]651147

Next, you can nest the other three data columns. Since the data has different meanings in each column, you should carry forward the column names as keys in the nested data, which should be an Object type:

Transformation Name Nest columns into Objects
Parameter: Columns {Height-Ins},{Weight-Lbs},{ShoeSize-Us}
Parameter: Nest columns to Object
Parameter: New column name 'measurements'

After you delete the source columns, you should have the following:

Studentscoresmeasurements
StudentA["84","89","92"]{"Height-Ins":"68","Weight-Lbs":"140","ShoeSize-US":"8"}
StudentB["98","94","96"]{"Height-Ins":"63","Weight-Lbs":"105","ShoeSize-US":"6"}
StudentC["81","83","82"]{"Height-Ins":"65","Weight-Lbs":"114","ShoeSize-US":"7"}

NOTE: You may have noticed that the column names for the nested columns start with a lower-case letter, which follows JSON formatting standards. The Student column should be renamed to student.

Now that you have the detail columns nested into an array and an object, you can nest these columns into an object for each student, as in the following transformation:

Transformation Name Nest columns into Objects
Parameter: Columns1 student
Parameter: Columns2 scores
Parameter: Columns3 measurement
Parameter: Nest columns to Object
Parameter: New column name column1

After you delete the three source columns, you can rename column1 to student.

Results:

student
{"student":"StudentA","scores":["84","89","92"],"measurements":{"ShoeSize-Us":"8","Weight-Lbs":"140","Height-Ins":"68"}}
{"student":"StudentB","scores":["98","94","96"],"measurements":{"ShoeSize-Us":"6","Weight-Lbs":"105","Height-Ins":"63"}}
{"student":"StudentC","scores":["81","83","82"],"measurements":{"ShoeSize-Us":"7","Weight-Lbs":"114","Height-Ins":"65"}}

When you run the job, generate a JSON output to produce the above in JSONLines format.

This page has no comments.