...
This migration process creates new versions of these imported datasets and fixes recipes accordingly.
- Through the Library for Data page, locate the imported datasets that are based on JSON files.
- You may be able to just search for
json
.
- You may be able to just search for
- For each JSON imported dataset:
- Click the link.
- In the Dataset Details page, copy the value for the Location. Paste it into a text file.
In the Dataset Details page, locate flow or flows where the dataset is in use.
Tip Tip: If you copy the link address of the flow and paste it into a text file, you can paste that later into a browser and jump directly to the flow.
Repeat the above steps for each JSON-based imported dataset.
- You should now have a list of links to the source data and the flows where your JSON imported datasets are in use.
- In the Library page, create a new version of each imported dataset:
- Click Import Data.
- Click the appropriate connection.
- Paste the link to the Location where the source is stored.
The data is ingested through the conversion service.
Tip Tip: Click the icon for the dataset in the right panel. All rows in the Preview panel should be properly structured. Nested data may not be broken out into separate columns at this time.
Rename the dataset as needed.
Tip Tip: You should give each new version of the imported dataset a consistent prefix or suffix tag, such as
-v2
. Later, you can locate these new imported datasets easily through search in the Library for Data page.- Click Continue.
- Repeat the above steps for each imported dataset that you are updating to v2.
- For each of these flows:
- Navigate to it.
- Locate the v1 imported dataset in it. You might copy the name.
- Click Add Datasets. Search for the v2 imported dataset. Add it to the flow.
- In Flow View:
- Click the recipe that is in use with the v1 version of the imported dataset. In the context menu in the right panel, select Make a copy > without inputs.
- Select the copied recipe.
- In the context menu in the right panel, select Change input. Select the v2 imported dataset.
- Your v2 imported dataset is now connected to a version of your recipe.
Select the recipe object. In the right panel, you should see a preview of the recipe steps.
Info NOTE: In the recipe, the steps where you modified the imported dataset into tabular format are likely to be be broken. This is ok.
- Click Edit recipe.
- In the Transformer page:
- Disable recipe step 1.
- Review the state of the data grid to see if the data is organized in tabular form.
- If not, repeat the above steps for the next step in your recipe.
- Continue until the data is in tabular form.
- After some additional tweaking, your recipe should contain no broken steps, and your data should appear in tabular form.
- You may wish to run a job or download your sample data to compare it to outputs from your v1 imported dataset and steps. You may need to create an output object first.
- You can now integrate these changes in either of the following ways:
Apply to existing recipe: Change the input on the existing to the v2 imported dataset. Apply any disabling of steps and other tweaks to the recipe's connected to the v1 imported dataset.
Info NOTE: Before applying the above changes, you might want to download the v1 recipe through the Recipe panel.
- Use v2 recipe in the flow: You could simply switch over to using the new recipe. Caveats:
- You must recreate any outputs and schedules from the v1 recipe.
- Internal identifiers for the new recipe and its outputs are different from the v1 recipe. These new identifiers may impact API-based automation.
- Other application objects that reference the v1 recipes, such as flow tasks in your plans, must be fixed to use the new recipe or output objects.
- Run a production job to verify that your flow is producing consistent data with the v2 imported dataset.
- Repeat as needed for other flows.
...
Tip |
---|
Tip: The easiest way to unnest is to select the column header for the column containing your Object data. Unnest should be one of the suggested options. If not, you can use the following process. |
- In the Recipe panel, click New Step.
- In the Search panel, enter
unnest object elements
. - Specify the following transformation. Substitute the Paths to elements values below with the top-level keys in your JSON records:
D trans p03Value author p06Name Path to elements5 p09Value true p01Name Column p06Value price p03Name Path to elements2 p07Value publish_date p04Value title SearchTerm Unnest object elements p07Name Path to elements6 p09Name Remove elements from original Type ref p05Name Path to elements4 p01Value column1 p02Name Path to elements1 p02Value id p05Value genre p04Name Path to elements3 p08Value description p08Name Path to elements7 - In the above, each Paths to elements entry specifies a key in the JSON record. The key's associated value becomes the value in the new column, which is given the same name as the key.
So, this step breaks out the key-value pairs for the specified keys into separate columns in the dataset.
Tip Tip: You can choose to remove the original from the source or not. In deeper or wider JSON files, removing can help to identify what remains to be unnested.
- Repeat the above process for the next level in the hierarchy. In the example, this step means unnesting the
characteristics
node:D trans p03Value characteristics.paper_stock Type ref p05Name Remove elements from original p01Name Column p01Value column1 p02Name Path to elements1 p02Value characteristics.cover_color p05Value true p03Name Path to elements2 p04Value characteristics.paper_source p04Name Path to elements3 SearchTerm Unnest object elements - You can now delete
column1
. From the column menu to the right ofcolumn1
, select Delete. You have now converted your JSON to tabular format.
Tip Tip: If the above set of steps needs to be applied to multiple files, you might consider stopping your work and returning to Flow View. Select this recipe and click Add New Recipe. If you add successive steps in another recipe, the first one can be used for doing initial processing of your JSON files, separate from any wrangling that you may do for individual files.
Tip Tip: The unnesting process may have moved some columns into positions that are different from their order in the original JSON. Use the Move command from the column menu to reposition your columns.
...
Tip |
---|
Tip: The following steps reshape your data. You may wish to create a new recipe as an output of the previous recipe where you can add the following steps. |
...
- When you re-nest, you want to nest from the lowest to top tier of the hierarchy.
- In the example, the following columns should be nested together:
characteristics.cover_color
,characteristics.paper_stock
, andcharacteristics.paper_source
:D trans p03Value characteristics.paper_source Type ref p05Name New column name p01Name column1 p01Value characteristics.cover_color p02Name column2 p02Value characteristics.paper_stock p05Value characteristics p03Name column3 p04Value Object p04Name Nest columns to SearchTerm Nest columns into Objects - In the generated
characteristics
column, you can remove thecharacteristics.
from the key value:D trans p03Value (empty) Type ref p01Name Column p01Value characteristics p02Name Find p02Value `characteristics.` p03Name Replace with SearchTerm Replace text or patterns - Now, delete the three source columns:
D trans p03Value characteristics.paper_source Type ref p01Name column1 p01Value characteristics.cover_color p02Name column2 p02Value characteristics.paper_stock p03Name column3 SearchTerm Delete columns Repeat the above steps for the next level of the hierarchy in your dataset.
Info NOTE: Do not nest the columns at the top level of the hierarchy.
...