Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • It is easier to work with JSON in which each row of the file is a record. When a record spans multiple rows, additional steps are required in the application to render it into tabular format. The example uses multi-row JSON records.
  • Each JSON record must be less than 1MB in size.

Output:

Info

NOTE: JSON-formatted files that are generated by

D s product
are rendered in JSON Lines format, which is a single line per-record variant of JSON. For more information, see http://jsonlines.org.

...

Tip

Tip: The easiest way to unnest is to select the column header for the column containing your Object data. Unnest should be one of the suggested options. If not, you can use the following process.

 


  1. In the Recipe panel, click New Step
  2. In the Search panel, enter unnest object elements
  3. Specify the following transformation. Substitute the Paths to elements values below with the top-level keys in your JSON records:

    D trans
    p03Valueauthor
    p06NamePath to elements5
    p09Valuetrue
    p01NameColumn
    p06Valueprice
    p03NamePath to elements2
    p07Valuepublish_date
    p04Valuetitle
    SearchTermUnnest object elements
    p07NamePath to elements6
    p09NameRemove elements from original
    Typeref
    p05NamePath to elements4
    p01Valuecolumn1
    p02NamePath to elements1
    p02Valueid
    p05Valuegenre
    p04NamePath to elements3
    p08Valuedescription
    p08NamePath to elements7

    1. In the above, each Paths to elements entry specifies a key in the JSON record. The key's associated value becomes the value in the new column, which is given the same name as the key. 
    2. So, this step breaks out the key-value pairs for the specified keys into separate columns in the dataset. 

      Tip

      Tip: You can choose to remove the original from the source or not. In deeper or wider JSON files, removing can help to identify what remains to be unnested.


  4. Repeat the above process for the next level in the hierarchy. In the example, this step means unnesting the characteristics node:

    D trans
    p03Valuecharacteristics.paper_stock
    Typeref
    p05NameRemove elements from original
    p01NameColumn
    p01Valuecolumn1
    p02NamePath to elements1
    p02Valuecharacteristics.cover_color
    p05Valuetrue
    p03NamePath to elements2
    p04Valuecharacteristics.paper_source
    p04NamePath to elements3
    SearchTermUnnest object elements
  5. You can now delete column1. From the column menu to the right of column1, select Delete.
  6. You have now converted your JSON to tabular format.

    Tip

    Tip: If the above set of steps needs to be applied to multiple files, you might consider stopping your work and returning to Flow View. Select this recipe and click Add New Recipe. If you add successive steps in another recipe, the first one can be used for doing initial processing of your JSON files, separate from any wrangling that you may do for individual files.


    Tip

    Tip: The unnesting process may have moved some columns into positions that are different from their order in the original JSON. Use the Move command from the column menu to reposition your columns.


...

Tip

Tip: The following steps reshape your data. You may wish to create a new recipe as an output of the previous recipe where you can add the following steps.

 


  1. When you re-nest, you want to nest from the lowest to top tier of the hierarchy. 
  2. In the example, the following columns should be nested together: characteristics.cover_colorcharacteristics.paper_stock, and characteristics.paper_source:

    D trans
    p03Valuecharacteristics.paper_source
    Typeref
    p05NameNew column name
    p01Namecolumn1
    p01Valuecharacteristics.cover_color
    p02Namecolumn2
    p02Valuecharacteristics.paper_stock
    p05Valuecharacteristics
    p03Namecolumn3
    p04ValueObject
    p04NameNest columns to
    SearchTermNest columns into Objects
  3. In the generated characteristics column, you can remove the characteristics. from the key value:

    D trans
    p03Value(empty)
    Typeref
    p01NameColumn
    p01Valuecharacteristics
    p02NameFind
    p02Value`characteristics.`
    p03NameReplace with
    SearchTermReplace text or patterns
  4. Now, delete the three source columns:

    D trans
    p03Valuecharacteristics.paper_source
    Typeref
    p01Namecolumn1
    p01Valuecharacteristics.cover_color
    p02Namecolumn2
    p02Valuecharacteristics.paper_stock
    p03Namecolumn3
    SearchTermDelete columns
  5. Repeat the above steps for the next level of the hierarchy in your dataset. 

    Info

    NOTE: Do not nest the columns at the top level of the hierarchy.


...

  1. Since the JSON output is newline delimited, your downstream system may need you to add commas at the end of each record but the last one. 
  2. If you have re-nested JSON hierarchies into your flat records, the exported JSON for secondary hierarchies appears as quoted strings, like the following:

    Code Block
    "characteristics":"{\"cover_color\":\"black\",\"paper_stock\":\"20\",\"paper_source\":\"new\"}",
    "characteristics":"{\"cover_color\":\"white\",\"paper_stock\":\"15\",\"paper_source\":\"recycled\"}",
    "characteristics":"{\"cover_color\":\"blue\",\"paper_stock\":\"20\",\"paper_source\":\"new\"}",

    The quoted strings can be fixed by simple search and replace.