Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info

NOTE: Avoid creating datasets with parameters where individual files or tables have differing schemas. Either import these sources separately and then correct in the application before performing a union on the datasets, or make corrections in the source application to standardize the schemas.

 


When working with datasets with parameters, it may be useful to do the following if you expect the underlying datasets to be less than 100% consistent with each other. 

  • Recreate the dataset with parameters, except deselect the Detect Structure option during the import step. 
  • In the Transformer page, collect a Random Sample using a full scan. This step attempts to gather data from multiple individual files, which may illuminate problems across the data.
Info

NOTE: For parameterized datasets sourced from

D s storage
, only the first 100,000 files are read.


Tip

Tip: If you suspect that there is a problem with a specific file or rows of data (e.g. from a specific date), you can create a static dataset from the file in question.

...

...

NOTE: For parameterized datasets sourced from

D s storage
, only the first 100,000 files are read.

Steps

Info

NOTE: Matching file path patterns in a large directory can be slow. Where possible, avoid using multiple patterns to match a file pattern or scanning directories with a large number of files. To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.

...

  1. In the Import Data page, navigate your environment to locate one of the files or tables that you wish to parameterize.  
  2. Click Create Dataset with Parameters

     

    D caption
    Create Dataset with Parameters


  3. Within the Define Parameterized Path, select a segment of text. Then select one of the following options:

    Tip

    Tip: For best results when parameterizing directories in your file path, include the trailing slash (/ ) as part of your parameterized value.


    1. Add Datetime Parameter
    2. Add Variable
    3. Add Pattern Parameter - wildcards and patterns
    4. For more information on limitations, see Overview of Parameterization.
    5. If you need to navigate elsewhere, select Browse.
  4. Specify the parameter. Click Save.
  5. Click Update matches. Verify that all of your preferred datasets are matching.

    Info

    NOTE: If you are matching with more datasets than you wish, you should review your parameters.


  6. Click Create.

  7. The parameterized dataset is loaded. See Import Data Page.

A flow containing a dataset with parameters has additional options for managing them. See Flow View Page.

Add Datetime Parameter

Datetime parameters require the following elements:

...

Info

NOTE: If multiple datasets within the same flow share the same variable name, they are treated as the same variable.

Tip

Tip: Type env. to see the environment parameters that can be applied. These parameters are available for use by each user in the environment.

Default Value: If the variable value is not overridden at execution time, this value is inserted in the variable location in the path.

Info

NOTE: When you edit an imported dataset, if a variable is renamed, a new variable is created using the new name. Any override values assigned under the old variable name for the dataset must be re-applied. Instances of the variable and override values used in other imported datasets remain unchanged.

Parameterize bucket names

...

.

...

info

NOTE: A workspace administrator or project owner can create environment parameters.

Uses:

  • Parameterized bucket names are very useful when you are moving flows between workspaces or projects. When the flow is imported into a new workspace, the environment parameter references the appropriate bucket name in the new workspace. 
  • If you change source buckets or move data to a new storage bucket, updating the paths to your objects can be as simple as changing the value of the environment parameter where your data is stored.

For example, suppose you have two environments: Dev and Prod. You can create an environment parameter called env.sourceBucketName to store the name of the bucket from which all data in the workspace or project is imported.

...

Code Block
$env.sourceBucketName = 'MyCo_Dev'

...

Code Block
$env.sourceBucketName = 'MyCo_Prod'

For more information, see Environment Parameters Page.

Add Pattern Parameter

In the screen above, you can see an example of pattern-based parameterization. In this case, you are trying to parameterize the two digits after the value: POS-r.

...

  • If disabled, the scan stops when the next slash (/) in the path is encountered. Folders are not matched. 
  • If enabled, the scan continues to any depth of folders.

    Info

    NOTE: A high number of files and folders to scan can significantly increase the time required to load your dataset with parameters. 


Example 1: all text files

Suppose your file and folder structure look like the following:

...

Info

NOTE: If regular expressions are poorly specified, they can create unexpected matches and results. Use them with care. For a list of limitations of regular expressions for parameterization, see Overview of Parameterization.

The following regular expression matches the same two sources in the previous screen:

...

D s item
itempattern
 is a platform-specific mechanism for specifying patterns, which is much simpler to use than regular expressions. These simple patterns can cover much of the same range of pattern expression as regular expressions without the same risks of expression and sometimes ugly syntax. For more information on 
D s item
itempatterns
, see Text Matching.

D s item
patterns
patterns
 are specified between back-ticks, as in the following:

...

This simpler syntax is easier to parse and performs the same match as the regular expression version.

For more information on 

D s item
itempatterns
, see Text Matching.

D s alsoinCQLtruelabel ((label = "dataset") OR (label = "parameter") OR (label = "import_ui"))