This section describes how you can import JSON files into , convert them to tabular format, wrangle them, and then export them back in the same JSON format.
The basic workflow is described by way of example. In the example workflow, the JSON file must be imported into , a new column must be inserted into the JSON, and the resulting JSON must be exported in the same structure.
Input:
Output:
NOTE: JSON-formatted files that are generated by |
\n
character. If your downstream system is expecting comma-delineated records except for the last one, additional work is required outside of the application.This example dataset contains information on books. In this case:
"book": { "id": "bk101", "author": "Guy, Joe", "title": "Json Guide", "genre": "Computer", "price": "44.95", "publish_date": "2002-04-26", "characteristics": { "cover_color": "black", "paper_stock": "20", "paper_source": "new" }, "description": "An in-depth look at creating applications." }, "book": { "id": "bk102", "author": "Nelson, Rogers", "title": "When Doves Cry", "genre": "Biography", "price": "24.95", "publish_date": "2016-04-21", "characteristics": { "cover_color": "white", "paper_stock": "15", "paper_source": "recycled" }, "description": "Biography of a prince." }, "book": { "id": "bk103", "author": "Fitzgerald, F. Scott", "title": "The Great Gatsby", "genre": "Fiction", "price": "9.95", "publish_date": "1925-04-10", "characteristics": { "cover_color": "blue", "paper_stock": "20", "paper_source": "new" }, "description": "Classic American novel." } |
Import the JSON file.
NOTE: During import, you should deselect the Detect Structure option. You are likely to need to rebuild the initial parsing steps to consume the file properly. Details are provided later. |
NOTE: This step is required only if a single JSON record in your imported dataset spans multiple rows. If you have single-row JSON records in the Transformer page, please skip to the next section. |
In the Transform Builder, the Split on value is probably the \n
character.
The above signals to the application to break up the data into individual rows on the newline (\n
) character. This transformation then breaks up your loose JSON on every single attribute. You must modify the Split on value so that it captures only the first attribute of each JSON record. For the above dataset, the Split on value must be the following, noting the space after the colon:
"book": |
\n
in front of the first reference to the above string. In the column histogram, select the black bar, which selects the empty row. In the Suggestions panel, locate the Delete rows suggest, and click Add. The row is removed.The next step involves converting your JSON records to a column of Object type values. The Object data type is a means of rendering records into key-value pairs. However, its structure is a little different from JSON. The following steps convert your JSON to an Object data type.
In the above, the key term is the Find pattern, which is a regular expression:
/\n\s*"/ |
In the above example, the first two records have commas at the end of them. Here is a snippet of their ends:
... "description":"An in-depth look at creating applications."}, ... "description":"Biography of a prince."}, ... "description":"Classic American novel."} |
The above transformation is similar to the previous one. However, in this one, the Find pattern uses a to indicate that the pattern should only be matched at the end of a record:
{end} |
Individual records should look similar to the following:
{"id": "bk101","author": "Guy, Joe","title": "Json Guide","genre": "Computer", \ "price": "44.95","publish_date": "2002-04-26",{"cover_color": "black", \ "paper_stock": "20","paper_source": "new"}, \ "description": "An in-depth look at creating applications."} |
The next step is to convert your JSON records to tabular format.
NOTE: For JSON records that have multiple levels in the hierarchy, you should unnest the top level of the hierarchy first, followed by each successive level. |
Tip: The easiest way to unnest is to select the column header for the column containing your Object data. Unnest should be one of the suggested options. If not, you can use the following process. |
unnest object elements
. So, this step breaks out the key-value pairs for the specified keys into separate columns in the dataset.
Tip: You can choose to remove the original from the source or not. In deeper or wider JSON files, removing can help to identify what remains to be unnested. |
characteristics
node:column1
. From the column menu to the right of column1
, select Delete.You have now converted your JSON to tabular format.
Tip: If the above set of steps needs to be applied to multiple files, you might consider stopping your work and returning to Flow View. Select this recipe and click Add New Recipe. If you add successive steps in another recipe, the first one can be used for doing initial processing of your JSON files, separate from any wrangling that you may do for individual files. |
Tip: The unnesting process may have moved some columns into positions that are different from their order in the original JSON. Use the Move command from the column menu to reposition your columns. |
Your JSON data is ready for wrangling.
In the following example, the discount
column is created. If the publication date is before 01/01/2000, then the discount is 0.1
(10%):
Continue adding steps until you have transformed your data as needed and are ready to run a job on it.
NOTE: If your desired JSON output does not include multiple hierarchies, you can skip this section. The generated JSON files are a single JSON record per row. |
If a job is run using the recipe created so far on the example data, a newline-delimited JSON file that has no hierarchies in it can be generated by the application. However, the dataset is a two-level hierarchy, so the elements in the characteristics
hierarchy are written out in the following manner:
"characteristics.cover_color":"black","characteristics.paper_stock":20,"characteristics.paper_source":"new", "characteristics.cover_color":"white","characteristics.paper_stock":15,"characteristics.paper_source":"recycled", "characteristics.cover_color":"blue","characteristics.paper_stock":20,"characteristics.paper_source":"new", |
You can take one of two approaches:
Re-nest the lower hierarchies until have you have a single flat record, containing some Object type columns that hold the underlying hierarchies. When the re-nested JSON records are exported, secondary hierarchies appear as escaped string values. More details later.
If you are re-nesting the lower hierarchies, you can use the following approach.
Tip: The following steps reshape your data. You may wish to create a new recipe as an output of the previous recipe where you can add the following steps. |
characteristics.cover_color
, characteristics.paper_stock
, and characteristics.paper_source
:characteristics
column, you can remove the characteristics.
from the key value:Repeat the above steps for the next level of the hierarchy in your dataset.
NOTE: Do not nest the columns at the top level of the hierarchy. |
When you are ready, you can run the job. Create or modify a publishing action to generate a JSON file for output. See Run Job Page.
When the job completes, you can click the JSON link in the Output Destinations tab of the Job Details page to download your JSON file. See Job Details Page.
Outside the application, you may need to do the following:
If you have re-nested JSON hierarchies into your flat records, the exported JSON for secondary hierarchies appears as quoted strings, like the following:
"characteristics":"{\"cover_color\":\"black\",\"paper_stock\":\"20\",\"paper_source\":\"new\"}", "characteristics":"{\"cover_color\":\"white\",\"paper_stock\":\"15\",\"paper_source\":\"recycled\"}", "characteristics":"{\"cover_color\":\"blue\",\"paper_stock\":\"20\",\"paper_source\":\"new\"}", |
The quoted strings can be fixed by simple search and replace.