Skip to main content

Create Output SQL Scripts

Note

This feature may not be available in all product editions. For more information on available features, see Compare Editions.

As part of job execution for an output, you can define SQL scripts to run before the job, after it, or both. These SQL scripts are stored as part of the output object definition and can be executed through any database connection to which the user has access. SQL scripts can be applied to file-based and table-based job executions.

  • When flows are shared, the shared user can modify SQL Scripts if the user has Editor permissions on the flow. See Overview of Sharing.

Example uses:

  • Insert or update log entries in a database log table before or after a job that publishes to file or database destinations.

  • Perform custom inserts, updates, and delete logic to other database tables based on job output data that is published to a database.

  • Create and refresh tables or materialized views that join the job’s output data with data from other tables using CREATE AS SELECT.

  • Operational tasks such as disabling/enabling indexes and managing partitions on supported databases.

Script Types

Note

If one of these scripted steps fails, then all downstream phases of the job also fail.

  • Pre-job: After a job has been initiated and before data is ingested into the platform, a SQL script can be executed by the Trifacta Application.

  • Post-job: After job results have been generated and published, a SQL script can be executed.

    Note

    If publishing job fails, then all downstream tasks also fail, including the SQL script, which is not executed and is recorded as a failed phase of the job execution.

Script execution

  • SQL lines in an individual script are executed in the order listed in the script.

  • If you have defined multiple scripts of the same type (pre-job, for example), those scripts may be executed in parallel.

    Note

    The order of listing of scripts in the Trifacta Application does not affect the order of execution of those scripts.

  • A warning message is displayed if the pre/post SQL scripts do not have any valid connection.

Limitations

Warning

These SQL scripts are executed without validation through the selected database connection. There are no explicit limitations on the types of SQL statements that can be executed. It is possible to do harm through this feature.

  • After each SQL statement in a script, a semi-colon is required.

  • SQL validation is not supported for some connection types.

  • When flows containing output SQL scripts are imported, the connection to the database where the script is to be executed must exist in the new environment. Otherwise, the SQL script is dropped from the import.

  • Output SQL script actions may not be supported for all connection types. If the connection is not available in the dropdown for selecting a connection, then the feature may not be available for the connection type.

Enable

This feature may need to be enabled in your environment.

A workspace administrator can enable the use of SQL scripts. For more information, see Workspace Settings Page.

Create Output SQL Script

Through the Trifacta Application, you add SQL scripts as part of the output object definition.

Tip

Depending on the nature of your SQL script, you may choose to test it first in a demo environment on a demo database.

Where to add:

You can create SQL scripts for the following types of outputs:

  • Manual Settings destinations: In Flow View, you can select an output object and then modify one of its Manual Settings destinations.

  • Scheduled Settings destinations: In Flow View, select an output object and then modify one of its Scheduled Settings destinations.

Steps:

  1. In Flow View, select the output object for which you wish to create the SQL script.

  2. In the Outputs panel on the right, click the Manual Settings tab.

  3. For the type of destination, click Edit.

  4. In the SQL Scripts panel at the bottom of the screen, click Add Script.

  5. In the Add SQL Script window:

    1. Select the database connection to use for executing the SQL script.

    2. Enter the SQL script in the panel.

    3. Choose when you would like to execute the script:

      1. Run before data ingest - before the job is executed

      2. Run after data publish - after the job results have been written

    4. Before you save your changes, click Validate SQL.

      Note

      Some connection types do not support SQL validation.

      Note

      Validating the SQL does not execute the SQL script on the database. It performs a check of SQL syntax against the selected database.

  6. To save your SQL script, click Add.

For more information, see SQL Scripts Panel.

Parameterize values

You can add variable or Datetime parameters to your SQL scripts.

  • Parameters with the same name that are also defined on input datasets, flow parameters, and output objects can be referenced during job execution to pass the same value for consistency.

Tip

You can parameterize values in your SQL script. Parameters can be variables, Datetime parameters, or environment parameters. For more information, see Overview of Parameterization.

Monitoring execution

You can monitor the execution of any SQL scripts that are part of a job execution. For more information, see Overview of Job Monitoring.

Example Scripts

In the following sections, you can review some common examples for how to use SQL scripts in your data pipelines.

Example - log entries

In this example, you insert log entries into a log table in your database before and after the execution of your job.

Pre-job:

Your SQL script might look like the following:

CREATE TABLE IF NOT EXISTS "transactions"."log-tri" (
   timestamp date, 
   jobType varchar(255), 
   jobStatus varchar(255)
);
INSERT INTO "transactions"."log-tri"(timestamp, jobType, jobStatus)
   VALUES ('2021-06-22','transformation','started');

The above script is composed of two statements:

  • CREATE TABLE IF NOT EXISTS - This statement creates the log-tri table in the transactions database.

    1. This table is defined with three fields: timestamp, jobType, and jobStatus, each of which is assigned a data type.

    2. The IF NOT EXISTS keyword ensures:

      1. The table is created if it does not exist.

      2. If it exists, then no error is returned, which could stop the job run.

INSERT INTO - This statement inserts a record into the log-tri table, populating each column with an appropriate VALUE:

Column name

Value

timestamp

'2021-06-22'

jobType

'transformation'

jobStatus

'started'

Tip

In the above example, the value for the timestamp is a literal value. If needed, you can parameterize that value, so that a Datetime parameter can be inserted into the record as needed. See "Parameterize values" above.

Post-job:

After the job results have been published, a post-job SQL script might look like the following:

CREATE TABLE IF NOT EXISTS "transactions"."log-tri" (
   timestamp date, 
   jobType varchar(255), 
   jobStatus varchar(255)
);
INSERT INTO "transactions"."log-tri"(timestamp, jobType, jobStatus)
   VALUES ('2021-06-22','transformation','complete');

This script is very similar to the previous:

  1. Create the table if it doesn't exist. This statement also provides schema information if you need to make modifications in the future.

  2. Inserts a new row in the table, indicating the transformation job type is now complete.

Example - updates based on job results

If you write your job results through the same connection where you are executing your SQL script, you can leverage the data directly from your job results into your SQL script.

In the following scenario, a customer account dimension table in the datawarehouse ( dw.DimCustAccount custdim) is updated with data enriched through Designer Cloud in the job results. In this case the num_emp, industry_cd, and duns columns are mapped to the corresponding columns in the custenr enriched data table with values where the customer identifier in the customer dimension table (custdim.custId) matches the customer identifier in the enriched data table (custenr.custId).

UPDATE TABLE dw.DimCustAccount custdim
  SET num_emp = custenr.empcnt, industry_cd = custenr.ind_cd, duns = custenr.duns_num
  FROM tri.cust_enriched custenr
  WHERE custdim.custId = custenr.custId;

Edit Output SQL Script

Steps:

  1. In Flow View, select the output object.

  2. In the context panel on the right, select the Manual Settings tab.

  3. Click Edit next to the type of destination to modify.

  4. In the dialog, locate the one to modify in the SQL Scripts panel. Click Edit.

  5. Make changes as need. Click Save.

Delete Output SQL Script

Warning

After you deleting a SQL script and save the output object, the SQL script is removed permanently. Before deleting, you may wish to copy the script and paste it into a text editor.

Steps:

  1. In Flow View, select the output object.

  2. In the context panel on the right, select the Manual Settings tab.

  3. Click Edit next to the type of destination to modify.

  4. In the dialog, hover over the one to modify in the SQL Scripts panel. From the More menu, select Delete.

Create Output SQL Script via API

You can create SQL scripts via API. These scripts can then be associated with specific output objects.

Create SQL script

Key information:

Attribute

Description

sqlScript

Text of the SQL script. You should validate this script before inserting it into the API.

type

Set type to be:

  • pre - execute before data ingest

  • post - execute after data publish

vendor

The vendor type of the database to which you are connecting. See Connection Types.

outputObjectId

Internal identifier of the output object to which you are associating the SQL script. When the object is selected in Flow View, the identifier is part of the URL.

connectionId

Internal identifier of the connection that you are using to execute the SQL script.

Endpoint

/v4/sqlScripts

Method

POST

For more information, see Designer Cloud Powered by Trifacta: API Reference docs

List SQL scripts

List all SQL scripts.

Endpoint

/v4/sqlScripts

Method

GET

For more information, see Designer Cloud Powered by Trifacta: API Reference docs

Edit SQL script

Endpoint

/v4/sqlScripts/{id}

Method

PATCH

where:

  • {id} is the internal identifier of the SQL script

For more information, see Designer Cloud Powered by Trifacta: API Reference docs

Delete SQL script

Endpoint

/v4/sqlScripts/{id}

Method

DELETE

where:

  • {id} is the internal identifier of the SQL script

For more information, see Designer Cloud Powered by Trifacta: API Reference docs