Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

D toc

Excerpt

Version 2: This section describes how you can import JSON files into 

D s product
rtrue
, convert them to tabular format, wrangle them, and then export them back in the same JSON format.

The basic workflow task is described by way of example. In the example workflowtask, the JSON file must be imported into 

D s product
rtrue
, a new column must be inserted into the JSON, and the resulting JSON must be exported in the same structure.

JSON Input and Output

Input: 

  • It is easier to work with JSON in which each row of the file is a record. When a record spans multiple rows, additional steps are required in the application to render it into tabular format. The example uses multi-row JSON records.

Output:

Info

NOTE: JSON-formatted files that are generated by

D s product
are rendered in JSON Lines format, which is a single line per-record variant of JSON. For more information, see http://jsonlines.org.

  • D s product
     can generate a JSON file as an output for your job. Characteristics of generated JSON files:
    • Newline-delimited: The end of each record is the \n character. If your downstream system is expecting comma-delineated records except for the last one, additional work is required outside of the application.
    • Non-nested: Each record in the generated file is flat. 
      • For multi-level JSON hierarchies, you can nest columns together and leave the top level as a set of columns in the data grid. However, on output, the second and lower hierarchies appear as quoted string values in the output. Additional cleanup is required outside of the application. 

Example

This example dataset contains information on books. In this case:

  • The data is submitted as one attribute per row. A single JSON record spans many rows.
  • The total number of books is three.
  • The JSON data has two hierarchies.
Code Block
"book": {
  "id": "bk101",
  "author": "Guy, Joe",
  "title": "Json Guide",
  "genre": "Computer",
  "price": "44.95",
  "publish_date": "2002-04-26",
  "characteristics": {
    "cover_color": "black",
    "paper_stock": "20",
    "paper_source": "new"
  },
  "description": "An in-depth look at creating applications."
},
"book": {
  "id": "bk102",
  "author": "Nelson, Rogers",
  "title": "When Doves Cry",
  "genre": "Biography",
  "price": "24.95",
  "publish_date": "2016-04-21",
  "characteristics": {
    "cover_color": "white",
    "paper_stock": "15",
    "paper_source": "recycled"
  },
  "description": "Biography of a prince."
},
"book": {
  "id": "bk103",
  "author": "Fitzgerald, F. Scott",
  "title": "The Great Gatsby",
  "genre": "Fiction",
  "price": "9.95",
  "publish_date": "1925-04-10",
  "characteristics": {
    "cover_color": "blue",
    "paper_stock": "20",
    "paper_source": "new"
  },
  "description": "Classic American novel."
}

JSON Workflow

...

Import the JSON file.

Info

NOTE: During import, you should deselect the Detect Structure option. You are likely to need to rebuild the initial parsing steps to consume the file properly. Details are provided later.

...

  1. Each level in the JSON hierarchy must be un-nested in a separate step.

...

D s lang

...

Enable

This method of working with JSON is enabled by default.

Info

NOTE: When this feature is enabled, all JSON imported datasets created under the legacy method must be recreated to behave like v2 datasets with respect to conversion and schema management. Features developed in the future may not retroactively be supported in the v1 legacy mode. You should convert to using the v2 method.


...


You can choose to continue using the legacy method of working with JSON.

Info

NOTE: The legacy version of JSON import is required if you are working with compressed JSON files or only Newline JSON files.

You should migrate your flows to using the new version.

Info

NOTE: The legacy version of working with JSON is likely to be deprecated in a future release.

For more information on migrating to the new version, see Working with JSON v1.

Requirements

JSON input 

  • Recommended limit of 1 GB in source file size. Since conversion happens within the 
    D s node
    , this limit may vary depending on the memory of the 
    D s node
    .
  • Each JSON record must be less than 20 MB in size.

  • Filename extensions must be .json or .JSON
  • Conversion of compressed JSON files is not supported. Compressed JSON files can be imported using the previous method. 
  • For best results, all keys and values should be quoted and imported as strings. 

    Info

    NOTE: Escape characters that make JSON invalid can cause your JSON file to fail to import.

    • You can escape quote values to treat them as literals in your strings using the backslash character. For example: \"

    • When the values are imported into the Transformer page, the 
      D s webapp
       re-infers the data type for each column.
JSON structureDescriptionSupported?
Newline

The newline character (\n) denotes the end of a record. Each record can contain the keys (object or array) and values for the JSON object.

Tip

Tip: This version is supported through through both versions of JSON import, but it performs better in v1. If you are using the Newline form of JSON exclusively, you should use v1.


Supported

Top-level objectTop-level row contains keys for mapping JSON objectsSupported
Top-level arrayTop-level row contains array of objectsSupported

JSON output

Info

NOTE: JSON-formatted files that are generated by

D s product
are rendered in JSON Lines format, which is a single line per-record variant of JSON. For more information, see http://jsonlines.org.

  • D s product
     can generate a JSON file as an output for your job. Characteristics of generated JSON files:
    • Newline-delimited: The end of each record is the \n character. If your downstream system is expecting comma-delineated records except for the last one, additional work is required outside of the application.
    • Non-nested: Each record in the generated file is flat. 
      • For multi-level JSON hierarchies, you can nest columns together and leave the top level as a set of columns in the data grid. However, on output, the second and lower hierarchies appear as quoted string values in the output. Additional cleanup is required outside of the application. 

Example

Example 1 - Rows of JSON records

The following example contains records of images from a website:

Code Block
{"metrics": [{"rank": "1043", "score" : "9679"}], "caption": "Such a good boy!", "id": "9kt8ex", "url": "https://www.example.com/w285fpp11.jpg", "filename": "w285fpp11.jpg"}
{"metrics": [{"rank": "1042", "score" : "9681"}], "caption": "This sweet puppy has transformed our life!", "id": "9x2774", "url": "https://www.example.com/fmll0cy11.jpg", "filename": "fmll0cy11.jpg"}
{"metrics": [{"rank": "1041", "score" : "9683"}], "caption": "We sure love our fur babies.", "id": "a8guou", "url": "https://www.example.com/mljnmq521.jpg", "filename": "mljnmq521.jpg"}

Notes:

  • Each row is a complete JSON record containing keys and values.

    Tip

    Tip: Nested JSON, such as metrics above, can be inserted as part of a record. It can then be unnested within the application.

  • Each key's value must have a comma after it, except for the final key value in any row. 

    Info

    NOTE: The end of a JSON record is the right curly bracket (}). Commas are not added to the end of each line in this format.

Task

  1. Import the JSON file.

  2. Any nested data must be unnested within columns. Each level in the JSON hierarchy must be un-nested in a separate step.
  3. When all of the JSON data is in tabular form, perform any 
    D s lang
     transformations. 
  4. If you need to rebuild the loose JSON hierarchy, you must nest the lower levels of the JSON hierarchy back into their original form.
    1. If it is ok to write out flat JSON records, you can export without nesting the data again.
  5. Run the job, generating a JSON output. 

Step - Import the file 

  1. Through the Import Data page, navigate and select your JSON file for import. 

  2. When the file has been loaded, click Edit settings for the dataset card in the right panel. In the Import Settings dialog, deselect the Detect Structure checkbox.
  3. Complete the rest of the import process. For more information, see Import Data Page
  4. Add the JSON-based imported dataset to a flow and create a recipe for it. For more information, see Flow View Page.
  5. Select the recipe, and click Edit Recipe...

Step - Convert to one JSON record per row

Info

NOTE: This step is required only if a single JSON record in your imported dataset spans multiple rows. If you have single-row JSON records in the Transformer page, please skip to the next section.

...

In the Transform Builder, the Split on value is probably the \n character.

...

Code Block
"book": 

...

Step - Convert JSON to Object type

The next step involves converting your JSON records to a column of Object type values. The Object data type is a means of rendering records into key-value pairs. However, its structure is a little different from JSON. The following steps convert your JSON to an Object data type.

  1. Since JSON uses character indentation to convey structure, you should remove these indentations if they appear in your dataset. For our two-layered example, you can use the following transformation:
    D transp03Value\"

    Info

    NOTE: File formats are detected based on the file extension. Please verify that your file extension is .json or .JSON, which ensures that it is passed through the conversion service.

    1. The file is passed through the conversion process, which reviews the JSON file and stores it on the base storage layer in a format that can be easily ingested as in row-per-record format. This process happens within the Import Data page. You can track progress on the right side of the screen.

  2. After the file has been converted, click the Preview icon on the right side of the screen. In the Preview, you can review the first few rows of the imported file.

    1. If some rows are missing from the preview, then you may have a syntax error in the first row after the last well-structured row. You should try to fix this in source and re-import.

    2. If all of the rows are problematic, your data is likely malformed.

  3. Complete the rest of the import process.

  4. In Flow View, add the JSON-based imported dataset to your flow and create a recipe for it.
    1. Select the recipe, and click Edit Recipe...

In the Transformer page, the example above should look like the following:

metricscaptionidurlfilename
[{"rank":"1043","score":"9679"}]Such a good boy!9kt8exhttps://www.example.com/w285fpp11.jpgw285fpp11.jpg
[{"rank":"1042","score":"9681"}]This sweet puppy has transformed our life!9x2774https://www.example.com/fmll0cy11.jpgfmll0cy11.jpg
[{"rank":"1041","score":"9683"}]We sure love our fur babies.a8guouhttps://www.example.com/mljnmq521.jpgmljnmq521.jpg


Step - Unnest JSON records

Your JSON records are in tabular format. If you have nested JSON objects within your JSON records, the next step is to unnest your JSON records. 

Info

NOTE: For JSON records that have multiple levels in the hierarchy, you should unnest the top level of the hierarchy first, followed by each successive level.

Tip

Tip: The easiest way to unnest is to select the column header for the column containing your nested data. Unnest should be one of the suggested options, and the suggestion should include the specification for the paths to the key values. If not, you can use the following process.


...

D trans
p03Valuecharacteristics.paper_stock
Typeref
p05NameRemove elements from original
p01NameColumn
p01Valuecolumn1
p02NamePath to elements1
p02Valuecharacteristics.cover_color
p05Valuetrue
p03NamePath to elements2
p04Valuecharacteristics.paper_source
p04NamePath to elements3
SearchTermUnnest object elements

...

You have now converted your JSON to tabular format.

Tip

Tip: If the above set of steps needs to be applied to multiple files, you might consider stopping your work and returning to Flow View. Select this recipe and click Add New Recipe. If you add successive steps in another recipe, the first one can be used for doing initial processing of your JSON files, separate from any wrangling that you may do for individual files.

...

Step - Wrangle your dataset

Your JSON data is ready for wrangling. 

In the following example, the discount column is created. If the publication date is before 01/01/2000, then the discount is 0.1 (10%):

D trans
p03Valuediscount
Typeref
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueIF(publish_date < DATE(2000, 1, 1), 0.1, 0)
p03NameNew column name
SearchTermNew formula

Continue adding steps until you have transformed your data as needed and are ready to run a job on it.

Step - Nest the JSON records

Info

NOTE: If your desired JSON output does not include multiple hierarchies, you can skip this section. The generated JSON files are a single JSON record per row.

If a job is run using the recipe created so far on the example data, a newline-delimited JSON file that has no hierarchies in it can be generated by the application. However, the dataset is a two-level hierarchy, so the elements in the characteristics hierarchy are written out in the following manner:

Code Block
"characteristics.cover_color":"black","characteristics.paper_stock":20,"characteristics.paper_source":"new",
"characteristics.cover_color":"white","characteristics.paper_stock":15,"characteristics.paper_source":"recycled",
"characteristics.cover_color":"blue","characteristics.paper_stock":20,"characteristics.paper_source":"new",

You can take one of two approaches:

  1. Generate the JSON file with a flat hierarchy. Output looks like the above. Use an external tool to unnest the second and lower hierarchies appropriately.
  2. Re-nest the lower hierarchies until have you have a single flat record, containing some Object type columns that hold the underlying hierarchies. When the re-nested JSON records are exported, secondary hierarchies appear as escaped string values. More details later.

If you are re-nesting the lower hierarchies, you can use the following approach. 

Tip

Tip: The following steps reshape your data. You may wish to create a new recipe as an output of the previous recipe where you can add the following steps.

 

...

D trans
p03Valuecharacteristics.paper_source
Typeref
p05NameNew column name
p01Namecolumn1
p01Valuecharacteristics.cover_color
p02Namecolumn2
p02Valuecharacteristics.paper_stock
p05Valuecharacteristics
p03Namecolumn3
p04ValueObject
p04NameNest columns to
SearchTermNest columns into Objects

...

D trans
p03Value(empty)
Typeref
p01NameColumn
p01Valuecharacteristics
p02NameFind
p02Value`characteristics.`
p03NameReplace with
SearchTermReplace text or patterns

...

D trans
p03Valuecharacteristics.paper_source
Typeref
p01Namecolumn1
p01Valuecharacteristics.cover_color
p02Namecolumn2
p02Valuecharacteristics.paper_stock
p03Namecolumn3
SearchTermDelete columns

...

Repeat the above steps for the next level of the hierarchy in your dataset. 

Info

NOTE: Do not nest the columns at the top level of the hierarchy.

Step - Generate JSON output

When you are ready, you can run the job. Create or modify a publishing action to generate a JSON file for output. See Run Job Page.

When the job completes, you can click the JSON link in the Output Destinations tab of the Job Details page to download your JSON file. See Job Details Page.

Step - Final Cleanup

Outside the application, you may need to do the following:

  1. Since the JSON output is newline delimited, your downstream system may need you to add commas at the end of each record but the last one. 
  2. If you have re-nested JSON hierarchies into your flat records, the exported JSON for secondary hierarchies appears as quoted strings, like the following:

    Code Block
    "characteristics":"{\"cover_color\":\"black\",\"paper_stock\":\"20\",\"paper_source\":\"new\"}",
    "characteristics":"{\"cover_color\":\"white\",\"paper_stock\":\"15\",\"paper_source\":\"recycled\"}",
    "characteristics":"{\"cover_color\":\"blue\",\"paper_stock\":\"20\",\"paper_source\":\"new\"}",

    The quoted strings can be fixed by simple search and replace. 

...

  1. In the Recipe panel, click New Step
  2. In the Search panel, enter unnest values into new columns
  3. Specify the following transformation. Substitute the Paths to elements values below with the top-level keys in your JSON records:

    D trans
    Typeref
    p01NameColumn
    p01Valuemetrics
    p02NamePath to elements1
    p02Value[0]
    SearchTermUnnest values into new columns

    Tip

    Tip: You can choose to remove the original from the source or not. In deeper or wider JSON files, removing can help to identify what remains to be unnested.

  4. In the above transformation, the bracketing array around the set of values has been broken down into raw JSON. This value may now be interpreted as a String data type. From the column drop-down, you can select Object data type.
  5. Click the column head again, or specify the following transformation to unnest the Object column:
    D trans
    p03Valuescore
    Typeref
    p01NameColumn
    p01Valuecolumn10
    p02NameFindPath to elements1
    p02Value/\n\s*"/rank
    p03NameReplace with
    p04Valuetrue
    p04NameMatch all occurrences
    SearchTermReplace text or patterns
    1. In the above, the key term is the Find pattern, which is a regular expression:

      Code Block
      /\n\s*"/
    2. The two forward slashes at the ends define the pattern as a regular expression.
    3. The content in the middle matches on the pattern of a newline character, an arbitrary number of spaces, and a double quote.
    4. This pattern is replaced with just the double-quote, removing the preceding part of the pattern from the dataset.
    5. For more information on matching patterns, see Text Matching.
    In standard JSON, a comma is used to demarcate the end of a line or a record, except for the last one in a set. 
  6. In the above example, the first two records have commas at the end of them. Here is a snippet of their ends:

    Code Block
    ... "description":"An in-depth look at creating applications."},
    ... "description":"Biography of a prince."},
    ... "description":"Classic American novel."}
  7. To convert these records to Object type, the commas at the end of the first two rows must be removed:
    D trans
    p03Value}
    Typeref
    p01NameColumn
    p01Valuecolumn1
    p02NameFind
    p02Value`\n\},\n{end}`
    p03NameReplace with
    p04Valuetrue
    p04NameMatch all occurrences
    SearchTermReplace text or patterns
    The above transformation is similar to the previous one. However, in this one, the Find pattern uses a 
    D s lang
    itempattern
     to indicate that the pattern should only be matched at the end of a record:
    Code Block
    {end}
  8. This token in the pattern prevents it from matching if there are other instances of the pattern nested within the record.
  9. For more information, see Text Matching.
  10. Individual records should look similar to the following:

    D s property overflow

    Code Block
    {"id": "bk101","author": "Guy, Joe","title": "Json Guide","genre": "Computer", \
    "price": "44.95","publish_date": "2002-04-26",{"cover_color": "black", \
    "paper_stock": "20","paper_source": "new"}, \
    "description": "An in-depth look at creating applications."}
  11. These records are suitable for conversion to Object data type. 
  12. To change the data type for the column, click the icon to the left of the column header. Select Object.
  13. The column data type is changed to Object. The step to change data type is added to your recipe, too.
  14. If the column histogram now displays some mismatched records. 
    1. Review those records to determine what is malformed.
    2. Delete the recipe step that changes the data type to Object.
    3. Make fixes as necessary.
    4. Switch back to Object data type. Iterate as needed until all records are valid when the column is converted to Object type.

Step - Unnest JSON records

The next step is to convert your JSON records to tabular format. 

Info

NOTE: For JSON records that have multiple levels in the hierarchy, you should unnest the top level of the hierarchy first, followed by each successive level.

Tip

Tip: The easiest way to unnest is to select the column header for the column containing your Object data. Unnest should be one of the suggested options. If not, you can use the following process.

 

...

D trans
p03Valueauthor
p06NamePath to elements5
p09Valuetrue
p01NameColumn
p06Valueprice
p03NamePath to elements2
p07Valuepublish_date
p04Valuetitle
SearchTermUnnest object elements
p07NamePath to elements6
p09NameRemove elements from original
Typeref
p05NamePath to elements4
p01Valuecolumn1
p02NamePath to elements1
p02Valueid
p05Valuegenre
p04NamePath to elements3
p08Valuedescription
p08NamePath to elements7
  1. In the above, each Paths to elements entry specifies a key in the JSON record. The key's associated value becomes the value in the new column, which is given the same name as the key. 
  2. So, this step breaks out the key-value pairs for the specified keys into separate columns in the dataset. 

    Tip

    Tip: You can choose to remove the original from the source or not. In deeper or wider JSON files, removing can help to identify what remains to be unnested.

  1. Path to elements2
    SearchTermUnnest Objects into columns

    1. In the above, each Paths to elements entry specifies a key in the JSON record. The key's associated value becomes the value in the new column, which is given the same name as the key. 
    2. So, this step breaks out the key-value pairs for the specified keys into separate columns in the dataset. 

  2. Repeat the above process for the next level in the hierarchy. 
  3. You can now delete the source columns. In the example, these source columns are named metrics and 0

    Tip

    Tip: SHIFT + click these columns and then select Delete columns from the right panel. Click Add.

  4. Repeat the above steps for each nested JSON object.

    Tip

    Tip: If the above set of steps needs to be applied to multiple files, you might consider stopping your work and returning to Flow View. Select this recipe and click Add New Recipe. If you add successive steps in another recipe, the first one can be used for doing initial processing of your JSON files, separate from any wrangling that you may do for individual files.

    Tip

    Tip: The unnesting process may have moved some columns into positions that are different from their order in the original JSON. Use the Move command from the column menu to reposition your columns.

Step - Wrangle your dataset

Your JSON data is ready for wrangling. Continue adding steps until you have transformed your data as needed and are ready to run a job on it.

Step - Nest the JSON records

Info

NOTE: If your desired JSON output does not include multiple hierarchies, you can skip this section. The generated JSON files are a single JSON record per row.

If you ran a job on the example dataset, the output would look like the following:

Code Block
{"rank":1043,"score":9679,"caption":"Such a good boy!","id":"9kt8ex","url":"https://www.example.com/w285fpp11.jpg","filename":"w285fpp11.jpg"}
{"rank":1042,"score":9681,"caption":"This sweet puppy has transformed our life!","id":"9x2774","url":"https://www.example.com/fmll0cy11.jpg","filename":"fmll0cy11.jpg"}
{"rank":1041,"score":9683,"caption":"We sure love our fur babies.","id":"a8guou","url":"https://www.example.com/mljnmq521.jpg","filename":"mljnmq521.jpg"}

Suppose you want to nest the url and filename columns into a nested array called, resources.

Re-nest the lower hierarchies until have you have a single flat record, containing some Object type columns that hold the underlying hierarchies. When the re-nested JSON records are exported, secondary hierarchies appear as escaped string values. More details later.

Tip

Tip: The following steps reshape your data. You may wish to create a new recipe as an output of the previous recipe where you can add the following steps.

Steps:

  1. SHIFT + click the url and filename columns. Then, select Nest columns in the right-hand panel. This transformation should look like the following:

    D trans
    p03ValueObject
    Typeref
    p01Namecolumn1
    p01Valueurl
    p02Namecolumn2
    p02Valuefilename
    p03NameNest columns to
    p04Valuecolumn1
    p04NameNew column name
    SearchTermNest columns into Objects
  2. column1 now contains an Object mapping of the two columns. You can now nest this column again into an Array:
    D trans
    p03Valueresources
    Typeref
    p01NameColumns
    p01Valuecolumn1
    p02NameNest columns to
    p02ValueArray
    p03NameNew column name
    SearchTermNest columns into Objects
  3. Delete column1.
  4. Continue nesting other columns in a similar fashion. Repeat the above steps for the next level of the hierarchy in your dataset.
  5. You must re-nested from the bottom of the target hierarchy to the top. 

    Info

    NOTE: Do not nest the columns at the top level of the hierarchy.

  6. When the column names contain all of the keys that you wish to generate in the top-level JSON output, you can run the job.

Step - Generate JSON output

When you are ready, you can run the job. Create or modify a publishing action to generate a JSON file for output. See Run Job Page.

When the job completes, you can click the JSON link in the Output Destinations tab of the Job Details page to download your JSON file. See Job Details Page.

Output file for the above example should look like the following:

Code Block
{"rank":1043,"score":9679,"caption":"Such a good boy!","id":"9kt8ex","url":"https://www.example.com/w285fpp11.jpg","filename":"w285fpp11.jpg","resources":[{"url":"https://www.example.com/w285fpp11.jpg","filename":"w285fpp11.jpg"}]}
{"rank":1042,"score":9681,"caption":"This sweet puppy has transformed our life!","id":"9x2774","url":"https://www.example.com/fmll0cy11.jpg","filename":"fmll0cy11.jpg","resources":[{"url":"https://www.example.com/fmll0cy11.jpg","filename":"fmll0cy11.jpg"}]}
{"rank":1041,"score":9683,"caption":"We sure love our fur babies.","id":"a8guou","url":"https://www.example.com/mljnmq521.jpg","filename":"mljnmq521.jpg","resources":[{"url":"https://www.example.com/mljnmq521.jpg","filename":"mljnmq521.jpg"}]}

Example 2 - Top-level array of JSON records

Your JSON may be formatted as a single top-level object containing an array of JSON records. The following example contains records of messages about individual diet and exercise achievements: 

Code Block
{
  "object": [
    {
      "score": 19669,
      "title": "M/07/1'3\" [23lbs > 13lbs = 10lbs] Still a bit to go, but my owner no longer refers to me as his chunky boy!",
      "ups": 19669,
      "id": "9kt8ex",
      "url": "https://i.redd.it/bzygw285fpp11.jpg",
      "short": "bzygw285fpp11.jpg"
    },
    {
      "score": 19171,
      "title": "M/29/5'11\" [605 pounds > 375 pounds = 230 pounds lost] (14 months) Still considered super morbidly obese but I've made some good progress.",
      "ups": 19171,
      "id": "9x2774",
      "url": "https://i.redd.it/wbbufmll0cy11.jpg",
      "short": "wbbufmll0cy11.jpg"
    },
    {
      "score": 16778,
      "title": "F/28/5\u20197\u201d [233lbs to 130lbs] Got tired of being obese and took control of my life!",
      "ups": 16778,
      "id": "a8guou",
      "url": "https://i.redd.it/3t0kmljnmq521.jpg",
      "short": "3t0kmljnmq521.jpg"
    },
    {
      "score": 16743,
      "title": "M/22/5'11\" [99lbs > 150lbs = 51lbs] Anorexia my recovery",
      "ups": 16743,
      "id": "atla3n",
      "url": "https://i.redd.it/9t6tvsjs16i21.jpg",
      "short": "9t6tvsjs16i21.jpg"
    }
  ]
}

The outer JSON is a single key-value pair:

  • key: object
  • value: array of JSON records

When source JSON records structured in this manner are imported, each JSON record in the object is imported into a separate row.  You can unnest this data by applying an Unnest values transformation. 

Info

NOTE: The object can contain only one nested array of JSON data. If the object contains multiple nested arrays, it is not not broken into separate rows. All unnesting must be performed in your recipe steps


Suppose you want to compute the average of all workout scores. First, you must unnest the JSON records and then apply the AVERAGE function.

Steps:

Tip

Tip: The easiest way to unnest is to select the column header for the column containing your data. After you select the column header, you are provided with suggestions to Unnest Values into new columns. You can use the Unnest suggestion and click Add. The following steps illustrate how to create this transformation manually.

  1. In the Recipe panel, click  New Step
  2. In the Search panel, enter unnest values into new columns
  3. Specify the following transformation. Substitute the Paths to elements values below with the top-level keys in your JSON records:

    D trans
    p03Valuescore
    p06NamePaths to elements
    p01NameColumn
    p06Valueups
    p03NamePath to elements
    p07Valueurl
    p04Valueshort
    SearchTermUnnest values into new columns
    p07NamePath to elements
    Typeref
    p05NamePath to elements
    p01Valueobject
    p02NamePath to elements
    p02Valueid
    p05Valuetitle
    p04NamePath to elements

  4. The above step breaks out the key-value pairs for the specified keys into separate columns in the dataset. Each Paths to elements entry specifies a key in the JSON record, which is used to create a new column of the same name. The key's associated value becomes a cell value in the new column. 

  5. You can now delete the source column. In the example, the source column is object

    Tip

    Tip: You can choose to remove the original from the source or not. In deeper or wider JSON files, removing can help to identify what remains to be unnested. When you're done unnesting a column and have removed data from the original, you should have an empty column.

Results:

id

score

short

title

ups

url

9kt8ex

19669

bzygw285fpp11.jpg

M/07/1'3" [23lbs > 13lbs = 10lbs] Still a bit to go, but my owner no longer refers to me as his chunky boy!

19669

https://i.redd.it/bzygw285fpp11.jpg

9x2774

19171

wbbufmll0cy11.jpg

M/29/5'11" [605 pounds > 375 pounds = 230 pounds lost] (14 months) Still considered super morbidly obese but I've made some good progress.

19171

https://i.redd.it/wbbufmll0cy11.jpg

a8guou

16778

3t0kmljnmq521.jpg

F/28/5’7” [233lbs to 130lbs] Got tired of being obese and took control of my life!

16778

https://i.redd.it/3t0kmljnmq521.jpg

atla3n

16743

9t6tvsjs16i21.jpg

M/22/5'11" [99lbs > 150lbs = 51lbs] Anorexia my recovery

16743

https://i.redd.it/9t6tvsjs16i21.jpg

Now you can find the average score by applying average function. 

D trans
p03ValueAverage_score
Typeref
p01NameFormula type
p01ValueSingle row formula
p02NameFormula
p02ValueAVERAGE(score)
p03NameNew column name
SearchTermNew formula

Results:

id

score

short

title

ups

url

Average_score

9kt8ex

19669

bzygw285fpp11.jpg

M/07/1'3" [23lbs > 13lbs = 10lbs] Still a bit to go, but my owner no longer refers to me as his chunky boy!

19669

https://i.redd.it/bzygw285fpp11.jpg

18090.25

9x2774

19171

wbbufmll0cy11.jpg

M/29/5'11" [605 pounds > 375 pounds = 230 pounds lost] (14 months) Still considered super morbidly obese but I've made some good progress.

19171

https://i.redd.it/wbbufmll0cy11.jpg

18090.25

a8guou

16778

3t0kmljnmq521.jpg

F/28/5’7” [233lbs to 130lbs] Got tired of being obese and took control of my life!

16778

https://i.redd.it/3t0kmljnmq521.jpg

18090.25

atla3n

16743

9t6tvsjs16i21.jpg

M/22/5'11" [99lbs > 150lbs = 51lbs] Anorexia my recovery

16743

https://i.redd.it/9t6tvsjs16i21.jpg

18090.25

D s also
inCQLtrue
label((label = "structuring_tasks") OR (label = "json"))