Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

Excerpt

Version 2: This section describes how you can import JSON files into 

D s product
rtrue
, convert them to tabular format, wrangle them, and then export them back in the same JSON format.

The basic workflow task is described by way of example. In the example workflowtask, the JSON file must be imported into 

D s product
rtrue
, a new column must be inserted into the JSON, and the resulting JSON must be exported in the same structure.

...

This method of working with JSON is enabled by default.

Info

NOTE: When this feature is enabled, all JSON imported datasets created under the legacy method must be recreated to behave like v2 datasets with respect to conversion and schema management. Features developed in the future may not retroactively be supported in the v1 legacy mode. You should convert to using the v2 method.


...


You can choose to continue using the legacy method of working with JSON.

...

  • Recommended limit of 1 GB in source file size. Since conversion happens within the 
    D s node
    , this limit may vary depending on the memory of the 
    D s node
    .
  • Each JSON record must be less than 20 MB in size.

  • Filename extensions must be .json or .JSON
  • Conversion of compressed JSON files is not supported. Compressed JSON files can be imported using the previous method. See Working with JSON v1. 
  • For best results, all keys and values should be quoted and imported as strings. 

    Info

    NOTE: Escape characters that make JSON invalid can cause your JSON file to fail to import.

    • You can escape quote values to treat them as literals in your strings using the backslash character. For example: \"

    • When the values are imported into the Transformer page, the 
      D s webapp
       re-infers the data type for each column.

...

  • Each row is a complete JSON record containing keys and values.

    Tip

    Tip: Nested JSON, such as metrics above, can be inserted as part of a record. It can then be unnested within the application.

  • Each key's value must have a comma after it, except for the final key value in any row. 

    Info

    NOTE: The end of a JSON record is the right curly bracket (}). Commas are not added to the end of each line in this format.

...

Task

  1. Import the JSON file.

  2. Any nested data must be unnested within columns. Each level in the JSON hierarchy must be un-nested in a separate step.
  3. When all of the JSON data is in tabular form, perform any 
    D s lang
     transformations. 
  4. If you need to rebuild the loose JSON hierarchy, you must nest the lower levels of the JSON hierarchy back into their original form.
    1. If it is ok to write out flat JSON records, you can export without nesting the data again.
  5. Run the job, generating a JSON output. 

...

  1. Through the Import Data page, navigate and select your JSON file for import. 

    Info

    NOTE: File formats are detected based on the file extension. Please verify that your file extension is .json or .JSON, which ensures that it is passed through the conversion service.

    1. The file is passed through the conversion process, which reviews the JSON file and stores it on the base storage layer in a format that can be easily ingested as in row-per-record format. This process happens within the Import Data page. You can track progress on the right side of the screen.

  2. After the file has been converted, click the Preview icon on the right side of the screen. In the Preview, you can review the first few rows of the imported file.

    1. If some rows are missing from the preview, then you may have a syntax error in the first row after the last well-structured row. You should try to fix this in source and re-import.

    2. If all of the rows are problematic, your data is likely malformed.

  3. Complete the rest of the import process. For more information, see  Import Data Page

    Add

  4. In Flow View, add the JSON-based imported dataset to a your flow and create a recipe for it. For more information, see Flow View Page.
    1. Select the recipe, and click Edit Recipe...

...

Tip

Tip: The easiest way to unnest is to select the column header for the column containing your nested data. Unnest should be one of the suggested options, and the suggestion should include the specification for the paths to the key values. If not, you can use the following process.

 


  1. In the Recipe panel, click New Step
  2. In the Search panel, enter unnest values into new columns
  3. Specify the following transformation. Substitute the Paths to elements values below with the top-level keys in your JSON records:

    D trans
    Typeref
    p01NameColumn
    p01Valuemetrics
    p02NamePath to elements1
    p02Value[0]
    SearchTermUnnest values into new columns

    Tip

    Tip: You can choose to remove the original from the source or not. In deeper or wider JSON files, removing can help to identify what remains to be unnested.

  4. In the above transformation, the bracketing array around the set of values has been broken down into raw JSON. This value may now be interpreted as a String data type. From the column drop-down, you can select Object data type.
  5. Click the column head again, or specify the following transformation to unnest the Object column:
    D trans
    p03Valuescore
    Typeref
    p01NameColumn
    p01Value0
    p02NamePath to elements1
    p02Valuerank
    p03NamePath to elements2
    SearchTermUnnest Objects into columns

    1. In the above, each Paths to elements entry specifies a key in the JSON record. The key's associated value becomes the value in the new column, which is given the same name as the key. 
    2. So, this step breaks out the key-value pairs for the specified keys into separate columns in the dataset. 

  6. Repeat the above process for the next level in the hierarchy. 
  7. You can now delete the source columns. In the example, these source columns are named metrics and 0

    Tip

    Tip: SHIFT + click these columns and then select Delete columns from the right panel. Click Add.

  8. Repeat the above steps for each nested JSON object.

    Tip

    Tip: If the above set of steps needs to be applied to multiple files, you might consider stopping your work and returning to Flow View. Select this recipe and click Add New Recipe. If you add successive steps in another recipe, the first one can be used for doing initial processing of your JSON files, separate from any wrangling that you may do for individual files.

    Tip

    Tip: The unnesting process may have moved some columns into positions that are different from their order in the original JSON. Use the Move command from the column menu to reposition your columns.

...

  1. SHIFT + click the url and filename columns. Then, select Nest columns in the right-hand panel. This transformation should look like the following:

    D trans
    p03ValueObject
    Typeref
    p01Namecolumn1
    p01Valueurl
    p02Namecolumn2
    p02Valuefilename
    p03NameNest columns to
    p04Valuecolumn1
    p04NameNew column name
    SearchTermNest columns into Objects
  2. column1 now contains an Object mapping of the two columns. You can now nest this column again into an Array:
    D trans
    p03Valueresources
    Typeref
    p01NameColumns
    p01Valuecolumn1
    p02NameNest columns to
    p02ValueArray
    p03NameNew column name
    SearchTermNest columns into Objects
  3. Delete column1.
  4. Continue nesting other columns in a similar fashion. Repeat the above steps for the next level of the hierarchy in your dataset.
  5. You must re-nested from the bottom of the target hierarchy to the top. 

    Info

    NOTE: Do not nest the columns at the top level of the hierarchy.

  6. When the column names contain all of the keys that you wish to generate in the top-level JSON output, you can run the job.

...

id

score

short

title

ups

url

Average_score

9kt8ex

19669

bzygw285fpp11.jpg

M/07/1'3" [23lbs > 13lbs = 10lbs] Still a bit to go, but my owner no longer refers to me as his chunky boy!

19669

https://i.redd.it/bzygw285fpp11.jpg

18090.25

9x2774

19171

wbbufmll0cy11.jpg

M/29/5'11" [605 pounds > 375 pounds = 230 pounds lost] (14 months) Still considered super morbidly obese but I've made some good progress.

19171

https://i.redd.it/wbbufmll0cy11.jpg

18090.25

a8guou

16778

3t0kmljnmq521.jpg

F/28/5’7” [233lbs to 130lbs] Got tired of being obese and took control of my life!

16778

https://i.redd.it/3t0kmljnmq521.jpg

18090.25

atla3n

16743

9t6tvsjs16i21.jpg

M/22/5'11" [99lbs > 150lbs = 51lbs] Anorexia my recovery

16743

https://i.redd.it/9t6tvsjs16i21.jpg

18090.25

D s also
inCQLtrue
label((label = "structuring_tasks") OR (label = "json"))