Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

This migration process creates new versions of these imported datasets and fixes recipes accordingly.

  1. Through the Library for Data page, locate the imported datasets that are based on JSON files.
    1. You may be able to just search for json
  2. For each JSON imported dataset:
    1. Click the link.
    2. In the Dataset Details page, copy the value for the Location. Paste it into a text file.
    3. In the Dataset Details page, locate flow or flows where the dataset is in use.

      Tip

      Tip: If you copy the link address of the flow and paste it into a text file, you can paste that later into a browser and jump directly to the flow.

    4.  Repeat the above steps for each JSON-based imported dataset.

  3. You should now have a list of links to the source data and the flows where your JSON imported datasets are in use. 
  4. In the Library page, create a new version of each imported dataset:
    1. Click Import Data.
    2. Click the appropriate connection. 
    3. Paste the link to the Location where the source is stored.
    4. The data is ingested through the conversion service.

      Tip

      Tip: Click the icon for the dataset in the right panel. All rows in the Preview panel should be properly structured. Nested data may not be broken out into separate columns at this time.

    5. Rename the dataset as needed.

      Tip

      Tip: You should give each new version of the imported dataset a consistent prefix or suffix tag, such as -v2. Later, you can locate these new imported datasets easily through search in the Library for Data page.

    6. Click Continue.
  5. Repeat the above steps for each imported dataset that you are updating to v2.
  6. For each of these flows:
    1. Navigate to it. 
    2. Locate the v1 imported dataset in it. You might copy the name. 
    3. Click Add Datasets. Search for the v2 imported dataset. Add it to the flow.
  7. In Flow View:
    1. Click the recipe that is in use with the v1 version of the imported dataset. In the context menu in the right panel, select Make a copy > without inputs
    2. Select the copied recipe. 
    3. In the context menu in the right panel, select Change input. Select the v2 imported dataset.
    4. Your v2 imported dataset is now connected to a version of your recipe. 
    5. Select the recipe object. In the right panel, you should see a preview of the recipe steps. 

      Info

      NOTE: In the recipe, the steps where you modified the imported dataset into tabular format are likely to be be broken. This is ok.

  8. Click Edit recipe
  9. In the Transformer page:
    1. Disable recipe step 1.
    2. Review the state of the data grid to see if the data is organized in tabular form. 
    3. If not, repeat the above steps for the next step in your recipe.
    4. Continue until the data is in tabular form.
  10. After some additional tweaking, your recipe should contain no broken steps, and your data should appear in tabular form. 
  11. You may wish to run a job or download your sample data to compare it to outputs from your v1 imported dataset and steps. You may need to create an output object first.
  12. You can now integrate these changes in either of the following ways:
    1. Apply to existing recipe: Change the input on the existing to the v2 imported dataset. Apply any disabling of steps and other tweaks to the recipe's connected to the v1 imported dataset.

      Info

      NOTE: Before applying the above changes, you might want to download the v1 recipe through the Recipe panel.

    2. Use v2 recipe in the flow: You could simply switch over to using the new recipe. Caveats:
      1. You must recreate any outputs and schedules from the v1 recipe.  
      2. Internal identifiers for the new recipe and its outputs are different from the v1 recipe. These new identifiers may impact API-based automation.
      3. Other application objects that reference the v1 recipes, such as flow tasks in your plans, must be fixed to use the new recipe or output objects.
  13. Run a production job to verify that your flow is producing consistent data with the v2 imported dataset. 
  14. Repeat as needed for other flows.

...

Tip

Tip: The easiest way to unnest is to select the column header for the column containing your Object data. Unnest should be one of the suggested options. If not, you can use the following process.

 


  1. In the Recipe panel, click New Step
  2. In the Search panel, enter unnest object elements
  3. Specify the following transformation. Substitute the Paths to elements values below with the top-level keys in your JSON records:

    D trans
    p03Valueauthor
    p06NamePath to elements5
    p09Valuetrue
    p01NameColumn
    p06Valueprice
    p03NamePath to elements2
    p07Valuepublish_date
    p04Valuetitle
    SearchTermUnnest object elements
    p07NamePath to elements6
    p09NameRemove elements from original
    Typeref
    p05NamePath to elements4
    p01Valuecolumn1
    p02NamePath to elements1
    p02Valueid
    p05Valuegenre
    p04NamePath to elements3
    p08Valuedescription
    p08NamePath to elements7

    1. In the above, each Paths to elements entry specifies a key in the JSON record. The key's associated value becomes the value in the new column, which is given the same name as the key. 
    2. So, this step breaks out the key-value pairs for the specified keys into separate columns in the dataset. 

      Tip

      Tip: You can choose to remove the original from the source or not. In deeper or wider JSON files, removing can help to identify what remains to be unnested.

  4. Repeat the above process for the next level in the hierarchy. In the example, this step means unnesting the characteristics node:

    D trans
    p03Valuecharacteristics.paper_stock
    Typeref
    p05NameRemove elements from original
    p01NameColumn
    p01Valuecolumn1
    p02NamePath to elements1
    p02Valuecharacteristics.cover_color
    p05Valuetrue
    p03NamePath to elements2
    p04Valuecharacteristics.paper_source
    p04NamePath to elements3
    SearchTermUnnest object elements
  5. You can now delete column1. From the column menu to the right of column1, select Delete.
  6. You have now converted your JSON to tabular format.

    Tip

    Tip: If the above set of steps needs to be applied to multiple files, you might consider stopping your work and returning to Flow View. Select this recipe and click Add New Recipe. If you add successive steps in another recipe, the first one can be used for doing initial processing of your JSON files, separate from any wrangling that you may do for individual files.

    Tip

    Tip: The unnesting process may have moved some columns into positions that are different from their order in the original JSON. Use the Move command from the column menu to reposition your columns.

...

Tip

Tip: The following steps reshape your data. You may wish to create a new recipe as an output of the previous recipe where you can add the following steps.

...


  1. When you re-nest, you want to nest from the lowest to top tier of the hierarchy. 
  2. In the example, the following columns should be nested together: characteristics.cover_colorcharacteristics.paper_stock, and characteristics.paper_source:

    D trans
    p03Valuecharacteristics.paper_source
    Typeref
    p05NameNew column name
    p01Namecolumn1
    p01Valuecharacteristics.cover_color
    p02Namecolumn2
    p02Valuecharacteristics.paper_stock
    p05Valuecharacteristics
    p03Namecolumn3
    p04ValueObject
    p04NameNest columns to
    SearchTermNest columns into Objects
  3. In the generated characteristics column, you can remove the characteristics. from the key value:

    D trans
    p03Value(empty)
    Typeref
    p01NameColumn
    p01Valuecharacteristics
    p02NameFind
    p02Value`characteristics.`
    p03NameReplace with
    SearchTermReplace text or patterns
  4. Now, delete the three source columns:

    D trans
    p03Valuecharacteristics.paper_source
    Typeref
    p01Namecolumn1
    p01Valuecharacteristics.cover_color
    p02Namecolumn2
    p02Valuecharacteristics.paper_stock
    p03Namecolumn3
    SearchTermDelete columns
  5. Repeat the above steps for the next level of the hierarchy in your dataset. 

    Info

    NOTE: Do not nest the columns at the top level of the hierarchy.

...