Page tree

Trifacta Dataprep



Contents:


   

Contents:


An output is defined as a set of files or tables, formats, and locations where results are written after a job run on the recipe has completed. To run a job from a flow, you must create an output object that defines where results are delivered after a job is successfully executed.

Every flow requires an output in order to publish results. An output object is composed of one or more publishing actions. A publishing action defines the output type, format, location, and other settings where results from a recipe are delivered. 

You can create publishing actions in multiple formats and publish those to different databases and file storage formats. The following are the output types:

  • Table-based outputs such as Oracle or PostgreSQL. For more information, see Connection Types.

Create an Output

You can use either of the following methods to create an output object and its related publishing action:

From Flow View:

In Flow View, an output object extends from a recipe, indicating the results of the recipe are delivered to the output object. 

  1. Open your flow in Flow View. 
  2. In Flow View, you can: 
    1. Right-click a recipe. Select Add Output to run.
    2. If an output already exists, select it.
  3. The output is displayed in the Details panel on the right-side.
  4. In the Details column under Manual Settings, click Edit.
  5. In the Publishing Settings page, click Add Publishing Action

Tip: For scheduled runs of your flow, you must specify Scheduled Settings to automatically generate the output when the flow is executed by a schedule. For more information on scheduling, see Overview of Automator

From Run Job page:

You can also create outputs from the Run Job page. 

  1. In Flow View, click Run Job.
  2. In the Run Job page, you can edit or add new outputs. To create a new one, click Add Publishing Action

For more information on Flow View, see Flow View Page.

For more information on creating output objects, see View for Outputs.

Create a File-Based Output 

You can create a file-based output by performing the following steps.

For more information on creating an output from Flow View and Run Job page, see above sections.

Steps:

  1. In the Publishing action page, select the connection where you wish to write file from the left panel. In the following example, the HDFS connection has been selected:

    Figure: Publishing action page for file output

  2. Select the file. You can select the existing file from the search list or click a Create a new file in the right panel. 
    1. Enter a file name in the Create a new file field. 
  3. To create output parameters, click the Parameterize destination link. See "Create an Output with Parameters" below.

  4. From the Data Storage Format drop-down list, select the output format for the file.

  5. The publishing actions vary based on the options selected. Select the required publishing actions below the drop-down list. For more information, see Run Job Page.

  6. Update the Delimiter field, if required.

  7. You can choose to generate the file as a Single File or as Multiple Files.
  8. To apply compression to the file, select the compression type from the Compression drop-down list.
  9. Click Add.

Tip: You can define SQL scripts that are executed before or after generation of your output objects. For more information, see Create Output SQL Scripts.


Create a Table-Based Output

You can create output objects for publishing to tables by performing the following steps:

For more information on creating an output from Flow View and Run Job page, see above sections.

Steps:

  1. In the Publishing action page, select the connection to the database where you wish to store the table from the left panel. In the following example, the postgres connection is selected:

    Figure: Publishing action for a table output

  2. Search the table. You can select an existing table from the list or click Create a new table in the right panel.

    1. Enter a table name in the Create a new table field. 
  3. To create output parameters, click the Parameterize destination link. See "Create an Output with Parameters" below.

  4. Select the required publishing actions below the drop-down list. For more information, see Run Job Page.

  5. Click Add.

Tip: You can define SQL scripts that are executed before or after generation of your output objects. For more information, see Create Output SQL Scripts.

Create an Output With Parameters

For any outputs, you can parameterize elements of the output path. You can parameterize your path with the following options.

Tip: You can define multiple parameters per output.

  • Timestamps: Inserts a formatted timestamp as part of the output path or filename
  • Variables: Inserts a value for the variable.

    • This variable has a default value that you assign.

    • Whenever you execute a job through the Run Job page, you can pass in the default value or an override value for the variable.

For more information on parameters, see Overview of Parameterization.

Parameterize path or bucket name with a variable

For file- or table-based publishing actions, you can replace the bucket name (if applicable) or elements of the output path with variable values. When you define the output, you replace an element of the output path with the variable name. At runtime, the variable name is replaced by the appropriate value.

Tip: You can use environment parameters to parameterize bucket names across your environment. For more information, see Environment Parameters Page.


  1. In the Publishing action page, click the Parameterize destination link. The Define Parameterized destination dialog is displayed.
  2. On the listed output path, highlight the part that you wish to parameterize. You can select part of the path or bucket name.

  3. Then, select Add Variable.


    Figure: Define parameterized destination

    1. Name: Enter a display name for the variable.

      Tip: Type env. to see the environment parameters that can be applied. These parameters are available for use by each user in the environment. For more information, see Overview of Parameterization.


      NOTE: If multiple variables within a flow (or its dependent flows) have the same name then they are treated as the same variable.

    2. Default value: Enter a default value for the parameter.
  4. Click Save.
  5. To save the parameters for the output path, click Submit.

The created parameter is displayed in the right context menu of the publishing action page.

Tip: If you created a variable parameter, you can apply override values to the variable when you are running a job. For example, you can modify a variable called baseFileName to generate an output with a different base filename for your job run. For more information, see Overview of Parameterization.

Parameterize path with a timestamp

Timestamp parameters can be helpful when you want to create outputs based on date and time format, time zone, or exact and relative start time. For file- or table-based publishing actions, you can create outputs based on the specific region or time zone for which the data is generated. When you define the output, you can replace an element of the output path with the timestamp parameters.

Steps:

  1. In the Publishing action page, click the Parameterize destination link. The Define Parameterized destination dialog is displayed. See example above.
  2. On the listed output path, highlight the part that you wish to parameterize. Then, select Add Timestamp Parameter.

  3. In the Timestamp Parameter dialog, enter the following details:

    1. Timestamp format: Specify the format for timestamp values.
      1. Example: YYYY-MM-DD_hh_mm.
      2. Values can express both date and time elements. For more information on the available tokens for formatting date and time values, see Datetime Data Type.
    2. Timestamp value: Select the value to record in the path:
      1. Exact job start date: recorded timestamp in path is the start time of the job.
      2. Relative to the job start date: recorded timestamp in path is relative to the start time of the job according to the settings that you specify here.
    3. Time zone: Click Change to change the time zone recorded in the timestamp.
      1. Example: America/Los Angeles or Asia/Calcutta.
      2. For more information on the available time zones, see Supported Time Zone Values.

  4. Click Save.

  5. To save the specified parameter for the output path, click Submit.

The created parameter is displayed in the right context menu of the publishing action page.


Edit an Output

From Flow View page:

  1. Right-click an output object. The object details are displayed in the context panel.
  2. In the context panel, select the Manual Settings tab. Then, click Edit. The Publishing Actions page is displayed.
  3. Make changes as needed in the Publishing Actions page. To save your changes, click Update.

From Run Job page:

In the Run Job page, hover over the publishing action to modify. Click Edit.

Delete an Output

You can delete the output object from the Flow View and from Run Jobs page:

Flow View page:

  1. In the Flow View, select the output. 
  2. In the right panel, select Delete Output from the context menu.

Run Jobs page:

Select Delete from the context menu of the Publishing Actions.

For more information, see Run Job Page.

This page has no comments.