Page tree

Trifacta Dataprep


Contents:

On April 28, 2021, Google is changing the required permissions for attaching IAM roles to service accounts. If you are using IAM roles for your Google service accounts, please see Changes to User Management.

   


Feature Availability: This feature is not available in
Cloud Dataprep Legacy by TRIFACTA INC. only.

Contents:


In the Flow Optimization Settings dialog, you can configure the following settings, which provide finer-grained control and performance tuning over your flow and its job executions. From the Flow View menu, select Optimization settings.

This feature must be enabled at the workspace level. When enabled, the settings in this dialog are applied to the current flow.

These optimizations are designed to improve performance by pre-filtering in the database the volume of data by reducing the columns and rows that are queried from the datastore to the ones that are actually used your recipes.

When these filters are enabled, the number of filters successfully applied to a job execution is listed in the Optimization summary in the Job Details page. See Job Details Page.

Enable optimization for jobs from this flow

When enabled, the Trifacta application attempts to apply any of the listed optimizations that are enabled to jobs that are executed for this flow.

When disabled, none of the listed optimizations is performed.

NOTE: If two consecutive job executions of a flow fail, then optimizations are skipped for the flow. If the job execution then succeeds, optimizations are automatically disabled for the flow. They can be re-enabled if needed.

Optimizations

The following optimizations are available.

Column pruning

When enabled, job execution performed is improved by removing any unused or redundant columns from the query.

Limitations on column pruning:

NOTE: SQL-based filtering is performed on a best-effort basis. When these optimizations are enabled for your flow, there is no guarantee that they will be applied during job execution.

  • Column pruning optimizations cannot be applied to imported datasets generated with custom SQL.

Filter pushdown

When this setting is enabled, the Trifacta application optimizes job performance on this flow by pushing data filters to the relational datasource, which limits the volume of data that must be transferred from the source.

Supported relational connections:

Optimizations for SQL filtering apply to the following types of relational connections:

  • PostgreSQL

  • Oracle

  • SQL Server 

  • Snowflake

  • Redshift

  • Azure SQL data warehouse

  • BigQuery

These connection types may or may not be available in your product edition. For more information, see Connection Types.

Limitations on filter pushdown:

NOTE: SQL-based filtering is performed on a best-effort basis. When these optimizations are enabled for your flow, there is no guarantee that they will be applied during job execution.

  • Filter pushdown optimizations cannot be applied to imported datasets generated with custom SQL.
  • Pushdown filters cannot be applied to dates in your relational sources.

This page has no comments.