Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r097

...

Excerpt

This section describes how to create datasets and replace segments by parameterizing the input paths to your data in the

d-s-productplatform
rtrue
 .

Structuring Your Data

Each file that is included as part of the dataset with parameters should have identical structures:

  • Matching file formats
  • Matching column order, naming, and data type
  • Matching column headers. Each column in any row that is part of a column header in a dataset with parameters should have a valid value that is consistent with corresponding values across all files in the dataset.

    Info

    NOTE: If your files have missing or empty values in rows that are used as headers in your recipe, these rows may be treated as data rows during the import process, which may result in unexpected or missing column values.

  • Within each column, the data format should be consistent. 
    • For example, if the date formats change between files in the source system, you and your recipe may not be able to manage the differences, and it is possible that data in the output may be missing.
Info

NOTE: Avoid creating datasets with parameters where individual files or tables have differing schemas. Either import these sources separately and then correct in the application before performing a union on the datasets, or make corrections in the source application to standardize the schemas.

 


When working with datasets with parameters, it may be useful to do the following if you expect the underlying datasets to be less than 100% consistent with each other. 

  • Recreate the dataset with parameters, except deselect the Detect Structure option during the import step. 
  • In the Transformer pageIf possible, collect a Random Sample using a full a full scan. This step attempts to gather data from multiple individual files, which may illuminate problems across the data.

...

Info

NOTE: If multiple datasets within the same flow share the same variable name, they are treated as the same variable.

...

  • Parameterized bucket names are very useful when you are moving flows assets between workspaces or projects. When the flow asset is imported into a new workspace, the environment parameter references the appropriate bucket name in the new workspace. 
  • If you change source buckets or move data to a new storage bucket, updating the paths to your objects can be as simple as changing the value of the environment parameter where your data is stored.

...

This simpler syntax is easier to parse and performs the same match as the regular expression version.For more information on 

D s item
itempatterns
, see Text Matching.


D s also
inCQLtrue
label((label = "dataset") OR (label = "parameter") OR (label = "import_ui"))

...