NOTE: Avoid creating datasets with parameters where individual files or tables have differing schemas. Either import these sources separately and then correct in the application before performing a union on the datasets, or make corrections in the source application to standardize the schemas.
When working with datasets with parameters, it may be useful to do the following if you expect the underlying datasets to be less than 100% consistent with each other.
- Recreate the dataset with parameters, except deselect the Detect Structure option during the import step.
- In the Transformer page, collect a Random Sample using a full scan. This step attempts to gather data from multiple individual files, which may illuminate problems across the data.
|D s storage|
Tip: If you suspect that there is a problem with a specific file or rows of data (e.g. from a specific date), you can create a static dataset from the file in question.
NOTE: For parameterized datasets sourced from
NOTE: Matching file path patterns in a large directory can be slow. Where possible, avoid using multiple patterns to match a file pattern or scanning directories with a large number of files. To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patterns.
- In the Import Data page, navigate your environment to locate one of the files or tables that you wish to parameterize.
Click Create Dataset with Parameters.
Create Dataset with Parameters
Within the Define Parameterized Path, select a segment of text. Then select one of the following options:
Tip: For best results when parameterizing directories in your file path, include the trailing slash (
/) as part of your parameterized value.
- Add Datetime Parameter
- Add Variable
- Add Pattern Parameter - wildcards and patterns
- For more information on limitations, see Overview of Parameterization.
- If you need to navigate elsewhere, select Browse.
- Specify the parameter. Click Save.
Click Update matches. Verify that all of your preferred datasets are matching.
NOTE: If you are matching with more datasets than you wish, you should review your parameters.
The parameterized dataset is loaded. See Import Data Page.
Add Datetime Parameter
Datetime parameters require the following elements:
NOTE: If multiple datasets within the same flow share the same variable name, they are treated as the same variable.
Default Value: If the variable value is not overridden at execution time, this value is inserted in the variable location in the path.
NOTE: When you edit an imported dataset, if a variable is renamed, a new variable is created using the new name. Any override values assigned under the old variable name for the dataset must be re-applied. Instances of the variable and override values used in other imported datasets remain unchanged.
Parameterize bucket names
You can create environment parameters to specify your bucket names. An environment parameter is a variable name and String value that can be referenced by all users of the environment.
NOTE: A workspace administrator or project owner can create environment parameters.
- Parameterized bucket names are very useful when you are moving flows between workspaces or projects. When the flow is imported into a new workspace, the environment parameter references the appropriate bucket name in the new workspace.
- If you change source buckets or move data to a new storage bucket, updating the paths to your objects can be as simple as changing the value of the environment parameter where your data is stored.
For example, suppose you have two environments: Dev and Prod. You can create an environment parameter called
env.sourceBucketName to store the name of the bucket from which all data in the workspace or project is imported.
|Environment Name||Source Bucket Name||Environment Parameter Value|
For more information, see Environment Parameters Page.
Add Pattern Parameter
In the screen above, you can see an example of pattern-based parameterization. In this case, you are trying to parameterize the two digits after the value:
- If disabled, the scan stops when the next slash (
/) in the path is encountered. Folders are not matched.
If enabled, the scan continues to any depth of folders.
NOTE: A high number of files and folders to scan can significantly increase the time required to load your dataset with parameters.
Example 1: all text files
Suppose your file and folder structure look like the following:
NOTE: If regular expressions are poorly specified, they can create unexpected matches and results. Use them with care. For a list of limitations of regular expressions for parameterization, see Overview of Parameterization.
The following regular expression matches the same two sources in the previous screen:
|D s item|
|D s item|
This simpler syntax is easier to parse and performs the same match as the regular expression version.
For more information on
|D s item|
|D s also|