Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next
d-

...

docdevnotes

Content has been disabled from publication with the AAC docs. Platform Features remains an unclear topic, as these features overlap between platform and Designer Cloud Powered by Trifacta. For Firefly, some of these features may not apply at all. 

Content has been moved back under Feature Overviews. In Platform, it would live under Parameterization.

d-toc

Excerpt

In 

D s product
rtrue
parameterization enables you to apply dynamic values to the data that you import and that you generate as part of job execution.

Parameter types:

  • Environment Parameters: A workspace administrator or project owner can specify parameters that are available across the environment, including default values for them.
  • Dataset Parameters: You can parameterize the paths to inputs for your imported datasets, creating datasets with parameters. For file-based imported datasets, you can parameterize the bucket where the source is stored.
  • Flow Parameters: You can create parameters of multiple types at the flow level, which can be referenced in any recipe in the flow.
  • Output Parameters: When you run a job, you can create parameters for the output paths for file- or table-based outputs.

These parameters can be defined by timestamp, patterns, wildcards, or variable values that you specify at runtime.

Environment Parameters

Project owners or workspace administrators can define parameters that apply across the project or workspace environment. These parameters can be referenced by any user in the environment, but only a user with admin access can define, modify, or delete these parameters.

...

  • Names of environment parameters must begin with env..

Limitations

  • You Environment parameter names are unique within the environment.
  • You cannot use environment parameters in recipes.
  • You cannot use environment parameters in plans.Environment parameter
  • names are unique within the environmentYou cannot use environment parameters in schedule triggers.

Example - parameterized bucket names

...

When you later export your flows from Dev and move them to Test and Prod, the imported flows automatically connect to the correct bucket for the target environment, since the bucket name is referenced by an environment parameter.

Export and Import

You can export environment parameters from one environment and import them to another. For example, you may be building your flows in a Dev workspace before they are exported and imported into a Prod workspace. If your flows make use of environment parameters from the Dev space, you may want to export the parameters and their values from the Dev workspace for migration to the Prod workspace. 

Info

NOTE: As part of the import process, you must reconcile name conflicts between imported environment parameters and the parameters that already exist in the workspace.

For more information, see Manage Environment Parameters.

Datasets with Parameters

In some cases, you may need to be able to execute a recipe across multiple instances of identical datasets. For example, if your source dataset is refreshed each week under a parallel directory with a different timestamp, you can create a variable to replace the parts of the file path that change with each refresh. This variable can be modified as needed at job runtime.

Example

Suppose you have imported data from a file system source, which has the following source path to weekly transactions:

Code Block
<file_system>:///source/transactions/2018/01/29/transactions.csv

In the above, you can infer a date pattern in the form of 2018/01/29, which suggests that there may be a pattern of paths to transaction files.  Based on the pattern, it'd be useful to be able to do the following:

  • Import data from parallel paths for other weeks' data.
  • Sample across all of the available datasets.
  • Execute jobs based on runtime variables that you set for other transaction sets fitting the pattern.

...

Example - parameterized database names

In the previous example, bucket names differed between the development and production workspaces. Similarly, there may be different values for the database to use between development and production. 

Tip

Tip: If you are creating datasets using custom SQL, this method can be used for migrating flows between workspaces. If the flows reference imported datasets that use custom SQL, you only need to update the value of the environment parameter in the target workspace and reset the connection identifier in order to reconnect the imported flow to its data sources. 


Environment NameDatabase Name
Devmyco-DB-dev
Testmyco-DB-test
Prodmyco-DB-prod

In your Dev workspace, you can create the environment parameter as follows:

Code Block
env.db-source

Export and Import

You can export environment parameters from one environment and import them to another. For example, you may be building your flows in a Dev workspace before they are exported and imported into a Prod workspace. If your flows make use of environment parameters from the Dev space, you may want to export the parameters and their values from the Dev workspace for migration to the Prod workspace. 

Info

NOTE: As part of the import process, you must reconcile name conflicts between imported environment parameters and the parameters that already exist in the workspace.

For more information, see Manage Environment Parameters.

Datasets with Parameters

In some cases, you may need to be able to execute a recipe across multiple instances of identical datasets. For example, if your source dataset is refreshed each week under a parallel directory with a different timestamp, you can create a variable to replace the parts of the file path that change with each refresh. This variable can be modified as needed at job runtime.

Example

Suppose you have imported data from a file system source, which has the following source path to weekly transactions:

Code Block
<file_system>:///source/transactions/2018/01/29/transactions.csv

In the above, you can infer a date pattern in the form of 2018/01/29, which suggests that there may be a pattern of paths to transaction files.  Based on the pattern, it'd be useful to be able to do the following:

  • Import data from parallel paths for other weeks' data.
  • Sample across all of the available datasets.
  • Execute jobs based on runtime variables that you set for other transaction sets fitting the pattern.

In this case, you would want to parameterize the date values in the path, such that the dynamic path would look like the following:

...

Tip

Tip: If there have been changes to the schema of the sources of your dataset with parameters, you can edit the dataset and update the parameters. See Library for Data Page.

Parameters in paths for imported datasets are rendered as regular expressions. Depending on the number of parameters and the comparative depth of them in a parameterized dataset, the process of performing all pattern checks can grow large, impacting import performance.

...

rendered as regular expressions. Depending on the number of parameters and the comparative depth of them in a parameterized dataset, the process of performing all pattern checks can grow large, impacting import performance.

Tip

Tip: When specifying an imported dataset with parameters, you should attempt to be as specific as possible in your parameter definitions.

Info

NOTE: When importing one or more Excel files as a parameterized dataset, you select worksheets to include from the first file. If there are worksheets in other Excel files that match the names of the worksheets that you selected, those worksheets are also imported. All worksheets are unioned together into a single imported dataset with parameters. Pattern-based parameters are not supported for import of Excel worksheets.

Mismatched Schemas

D s product
 expects that all datasets imported using a single parameter have schemas that match exactly. The schema for the entire dataset is taken from the first dataset that matches for import.

...

  • You cannot create datasets with parameters from uploaded data.
  • You cannot create dataset with parameters from multiple file types.
    • File extensions can be parameterized. Mixing of file types (e.g. TXT and CSV) only works if they are processed in an identical manner, which is rare.You cannot create parameters across text and binary file types
    • You cannot create parameters across text and binary file types.

  • For datasources that require conversion, such as Excel, PDF, or JSON files, you can create a dataset with parameters from a maximum of 500 converted files.
  • Parameter and variable names can be up to 255 characters in length.
  • For regular expressions, the following reference types are not supported due to the length of time to evaluate:
    • Backreferences. The following example matches on axabxb, and cxc yet generates an error:

      Code Block
      ([a-c])x\1
    • Lookahead assertions: The following example matches on a, but only when it is part of an ab pattern. It generates an error:

      Code Block
      a(?=b)
  • For some source file types, such as Parquet, the schemas between source files must match exactly.

...

Info

NOTE: Matching file path patterns in a large directory can be slow. Where possible, avoid using multiple patterns to match a file pattern or scanning directories with a large number of files. To increase matching speed, avoid wildcards in top-level directories and be as specific as possible with your wildcards and patternsdirectories and be as specific as possible with your wildcards and patterns.

Tip

Tip: For best results when parameterizing directories in your file path, include the trailing slash (/ ) as part of your parameterized value.

Options:

  • You can choose to search nested folders for files that match your specified pattern.

Tip

Tip: If your imported dataset is stored in a bucket, you can parameterize the bucket name, which can be useful if you are migrating flows between environments or must change the bucket at some point in the future.

For more information, see Create Dataset with Parameters.

...

You can specify flow parameters and their default permitted values, which can be invoked in the recipe steps of your flow. Wherever the flow parameter is invoked, it is replaced by the value you set for the parameter.

Uses:

  • Dynamically affect recipe steps
  • Improve flow usability; build fewer flows and recipes to maintain
  • Parameters are evaluated at design time in the Transformer page and at runtime during job execution
  • All parameter values can be overridden, as needed.

Flow parameter types:

  • Variable type flow parameters:
    • Literal values: These values are always of String data type.

      Tip

      Tip: You can wrap flow parameter references in your transformations with one of the PARSE functions

    . For more information, see Create Flow Parameter
    • .

      Info

      NOTE: Wildcards are not supported.

    • D s lang
      itempatterns
      . For more information, see Text Matching.
    • Regular
    expressions.
    • expressions.
  • Selector type flow parameters:
    • These parameters are specified as a list of permitted values. 
    • The first value is the default value.
    • The parameter values are treated as String values. 

Limitations

  • Flow parameters are converted to constants in macros. Use of the macro in other recipes results in the constant value being applied.
  • A flow parameter cannot be used in some transformation steps or fields. 

Example - variable type

Suppose you need to process your flow across several regions of your country. These regions are identified using a region ID value: pacificmountaincentral, or eastern

From the Flow View context menu, you select Manage parameters. In the Parameters tab, you specify the parameter name:

Code Block
paramRegion

You must specify a default value. To verify that this critical parameter is properly specified before job execution, you set the default value to:

Code Block
##UNSPECIFIED##

The above setting implies two things:

...

provide variable date ranges to your transformations. For example, you may need to be able to transform the data for the preceding 30, 60, 90, or custom days. 

You can create a Variable parameter called:

Code Block
$paramDaysBack

To verify that this critical parameter is properly specified before job execution, you set the default value to:

Code Block
##UNSPECIFIED##

The above setting implies two things:

  • If the above value appears in the output, then an override value for the parameter was not specified when the job was executed, which prevents the default value being used erroneously.
  • Before the job is executed, you must specify an override value. You can specify an override:
    • At the flow level to assist in recipe development.
    • At run time to insert the proper DaysBack value for the job run. 
  • Since all values are treated as String values, you must interpret the numeric values within your transformation steps as Integer data types.

For more examples, see Create Flow Parameter.

Example - selector type

Suppose you need to process your flow across several regions of your country. These regions are identified using a region ID value. 

From the Flow View context menu, you select Manage parameters. In the Parameters tab, you specify Selector for the Type.

Specify the parameter name:

Code Block
paramRegion

You can then add Choice values for the permitted regions:

Code Block
WEST
MOUNTAIN
CENTRAL
EAST
OTHER

After the flow parameter has been created, you can invoke it in a transformation step using the following syntax.

...

Where the parameter is referenced, the default or applicable override value is applied.  

You can test for the parameter by using the following conditional in your transformation step and selecting from one of the listed values:

Code Block
$paramRegion == <selected_value>

For more examples, see Create Flow Parameter.

...

Tip

Tip: These types of parameters can be applied to file or table paths. An output path can contain multiple parameters.

Example

Suppose you are generating a JSON file as the results of job execution. 

...

In this case, you would create an output parameter of timestamp type as part of the write settings for the job you are scheduling. 

Creating output parameters

When you are creating or editing a publishing action in the Run Jobs page, you can click the Parameterize destination link that appears in the right panel.

Tip

Tip: For outputs that are stored in buckets, you can parameterize the name of the bucket.

For more information, see Create Outputs.

Using output parameters

Whenever you execute a job using the specified publishing action, the output parameters are applied. 

After specifying variable parameters, you can insert new values for them at the time of job execution in the Run Job page.

For more information, see Run Job Page

Bucket Name Parameters

In addition to parameterizing the paths to imported datasets or outputs, you can also apply parameters to the buckets where these assets are stored. For example, if you are developing flows in one workspace and deploying them into a production workspace, it may be useful to create a parameter for the name of the bucket where outputs are written for the workspace.

Bucket names can be parameterized for the buckets in the following datastores:

  • S3
Info

NOTE: Bucket names for

D s tfs
cannot be parameterized.

...

For more information:

Parameter Overrides

For each of the following types of parameter, you can apply override values as needed.

...

At the flow level, you can apply override values to flow parameters. These values are passed into the recipe and the rest of the flow for evaluation during recipe development and job execution.

...

you are scheduling. 

Creating output parameters

For more information, see Create Outputs.

Using output parameters

Whenever you execute a job using the specified publishing action, the output parameters are applied. 

After specifying variable parameters, you can insert new values for them at the time of job execution in the Run Job page.

For more information, see Run Job Page

Bucket Name Parameters

In addition to parameterizing the paths to imported datasets or outputs, you can also apply parameters to the buckets where these assets are stored. For example, if you are developing flows in one workspace and deploying them into a production workspace, it may be useful to create a parameter for the name of the bucket where outputs are written for the workspace.

Bucket names can be parameterized for the buckets in the following datastores:

  • S3
Info

NOTE: Bucket names for

D s tfs
cannot be parameterized.

...

For more information:

Parameter Overrides

For each of the following types of parameter, you can apply override values as needed.

Override TypeDescription
dataset parametersWhen you run a job, you can apply override values to variables for your imported datasets. See Run Job Page.
flow parameters

At the flow level, you can apply override values to flow parameters. These values are passed into the recipe and the rest of the flow for evaluation during recipe development and job execution.

Info

NOTE: Overrides applied at the flow level are passed into all recipes and other objects in the flow. Wherever there is case-sensitive match between the name of the overridden parameter and a parameter name in the flow, the override value is applied. These values can be overridden by ad-hoc values. See "Order of Precedence" below.

When flows are referenced in a plan task, their parameters and default values are available for override.

output parametersWhen you define your output objects in Flow View, you can apply override values to the parameterized output paths on an as-needed basis when you specify your job settings. See Run Job Page.
environment parameters

Environment parameters cannot be overridden.

Triggers

In flow and plan triggers, you can apply overrides to parameter values that are inherited from the object being triggered. 

Info

NOTE: In a trigger, displayed parameter values may be inherited from the plan or flow that is being triggered. To ensure that the proper value is used, you should set a specific value for the override in the trigger. This is a known issue.

Order of Parameter Evaluation

...

Info

NOTE: The following does not apply to environment parameters, which cannot be overridden.

  1. RunRuntime or trigger-time based overrides:  Parameter values specified at run-time for runtime for ad-hoc or scheduled jobs. 

    Info

    NOTE: The override value is applied to all subsequent operations in the platform. When a job is submitted to the job queue, any overrides are applied at that time. Changes to override values do not affect jobs that are already in flight.

    Info

    NOTE: You can specify run-time override values when executing jobs through the APIs. See API Workflow Task - Run Job.

    See 
    1. For more information on runtime overrides, see Run Job Page.

    2. For more information for trigger-based overrides for flows, see Add Schedule Dialog.
    3. For more information on trigger-based overrides for plans, see Plan View for Triggers.
  2. Flow- or plan-level overrides:  At the flow or plan level, you can specify override values, which are passed into the flow's objects. These values can be overridden by overrides set in the above locations. See See Manage Parameters Dialog.
  3. Default values: If no overrides are specified, the default values are applied:
    1. Imported datasets: See Create Dataset with Parameters.
    2. Flow parameters: See Manage Parameters Dialog.
    3. Output parameters: See Run Job Page.
  4. Inherited (upstream) values: Any parameter values that are passed into a flow can be overridden by any matching override specified within the downstream flow.

...

  • Data sources tab: For file-based parameterized datasets, you can review the files that were matched at runtime for the specified parameters.
  • Parameters tab: View the parameter names and values that were used as part of the job, including the list of matching datasets. 

See Job Details Page.

D s also
labelparameterization