Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

Metadata is data about your data. For example, you might decide that one or more of the following types of information about your dataset should be tracked:

  • Source system(s)
  • Source path and (if applicable) filenameSource creation date
  • Date of import
  • Date of wrangling
  • Name of person who performed the wrangling

This section provides some methods for how to insert metadata into your dataset.

Insert filepath

For file-based data sources, you can insert the path to the source file in your dataset using the $filepath reference.

...

.

...

 

In your recipe, insert the following transformation:

D transp03ValuesourceDatasetPathTypestepp01NameFormula typep01ValueSingle row formulap02NameFormulap02Value$filepathp03NameNew column nameSearchTermNew formula

For more information, see Source Metadata References.

Insert source row number

You can insert the row number in the source file from which rows in your dataset are sourced, using the $sourcerownumber reference.

...

  1. Use your CSV file as the source for a new dataset within the flow containing the associated dataset.
  2. In the data grid, make sure that the first line of data is treated as the header. If not, add a header transform to your recipe.
  3. Open the other (source) dataset in the Transformer page.
  4. In the recipe panel of the Transformer page, add a new step. In the Transformation textbox, enter union.

  5. Create a union:
    1. Include all columns from both datasets.
    2. Configure the step to perform the union by name, instead of by position.
    3. See Union Page.
  6. Add this step to your recipe.
  7. You should see one row in the union recipe that contains the new data.
  8. Sort your data by a key value (e.g. business_id).

  9. Determine an appropriate grouping parameter. This step is necessary to simplify the filling process when the job runs at scale. Ideally, you should choose a grouping column that contains a relative few number of values in it (e.g. region).

  10. Fill values in the data rows with metadata column values. For each metadata column, add the following transformation, done here for the source_system column of metadata.

    D trans
    p03Valuebusiness_id
    Typestep
    p01NameFormula
    p01ValueFILL(source_system)
    p02NameGroup by
    p02Valueregion
    p03NameOrder by
    SearchTermWindow

  11. Repeat the above step for each metadata column you want to insert.

  12. Drop Delete the source metadata columns.
  13. Rename the window columns to use a more appropriate name.
  14. Delete the row containing the original metadata values.