Page tree

Trifacta Dataprep



Contents:

   

Contents:


Feature Availability: This feature is not available in
Dataprep Legacy by Trifacta only.


In the Flow Optimization Settings dialog, you can configure the following settings, which provide finer-grained control and performance tuning over your flow and its job executions. From the Flow View menu, select Optimization settings.

This feature must be enabled at the workspace level. When enabled, the settings in this dialog are applied to the current flow.


These optimizations are designed to improve performance by pre-filtering the volume of data by reducing the columns and rows to the ones that are actually used.

When these filters are enabled, the number of filters successfully applied to a job execution is listed in the Optimization summary in the Job Details page. See Job Details Page.

Enable optimization for jobs from this flow

When enabled, the Trifacta application attempts to apply any of the listed optimizations that are enabled to jobs that are executed for this flow.

NOTEWhen this option is disabled, then no optimization settings are available. 


NOTE: If two consecutive job executions of a flow fail, then optimizations are skipped for the flow. If the job execution then succeeds, optimizations are automatically disabled for the flow. They can be re-enabled if needed.

General Optimizations

The following optimizations can be enabled or disabled in general. For individual data sources, you may be able to enable or disable these settings based on your environment and its requirements .

Tip: These optimizations are applied at the recipe level. They can be applied on any flow and may improve performance within the Transformer page.

Column pruning optimization

When enabled, job execution performance is improved by removing any unused or redundant columns based on the recipe that is selected.

Filter optimization

When this setting is enabled, the Trifacta application optimizes job performance on this flow by pushing data filters to recipes.

Databases that Support Pushdown

Individual types of databases may support one or more of the following pushdowns. Additional restrictions may apply for your specific database.

Tip: These optimizations are applied to queries of your relational datasources that support pushdown. These optimizations are applied within the source, which limits the volume of data that is transferred during job execution.


Column pruning from source

When enabled, job execution performance is improved by removing any unused or redundant columns from the source database.

Limitations:

  • Column pruning optimizations cannot be applied to imported datasets generated with custom SQL.

Filter pushdown

When this setting is enabled, the Trifacta application optimizes job performance on this flow by pushing data filters directly on the source database.

Limitations:

  • Filter pushdown optimizations cannot be applied to imported datasets generated with custom SQL.
  • Pushdown filters cannot be applied to dates in your relational sources.

NOTE: SQL-based filtering is performed on a best-effort basis. When these optimizations are enabled for your flow, there is no guarantee that they will be applied during job execution.


NOTE: The connection types may or may not be available in your product edition. For more information, see Connection Types.


BigQuery Optimization

Feature Availability: This feature is available in the following editions:

  • Dataprep Enterprise Edition by Trifacta
  • Dataprep Professional Edition by Trifacta
  • Dataprep Premium by Trifacta
  • Dataprep Standard by Trifacta

The following optimization settings can be applied during to datasets sourced from BigQuery.

Tip: If all datasources and outputs for your flow are located in BigQuery, you can enable all of these optimizations to instruct the Trifacta application to execute the job in BigQuery, which can significantly improve performance. For more information on BigQuery as a running environment, see Overview of Job Execution.

Column pruning from source

When enabled, job execution performance is improved by removing any unused or redundant columns from the source database.

Filter pushdown

When this setting is enabled, the Trifacta application optimizes job performance on this flow by pushing data filters directly on the source database.

Join pushdown

When this setting is enabled, the Trifacta application applies join transformations within BigQuery.

Limitations on join pushdown for BigQuery:

  • Cross joins are not pushed down.

  • Fuzzy match joins are not pushed down.

  • Joins that use the ignore special characters setting are not pushed down.

Expression pushdown

When enabled, expression pushdown leverages SQL to compute Wrangle functions and other expressions within BigQuery.

Aggregate pushdown

When this setting is enabled, the Trifacta application applies aggregations to data within BigQuery.

Limitations on aggregate pushdown for BigQuery:

  • Transformations that generate flat aggregates (Aggregates that create a new column, instead of a new table) must be accompanied by a Group By parameter.

Union pushdown

When this setting is enabled, the Trifacta application applies union operations between multiple datasets and matches columns in the datasets within BigQuery.

Profile pushdown

When enabled, visual profiles for this flow can be executed in BigQuery when a job successfully completes in BigQuery.

NOTE: Calculated values in visual profiles generated on BigQuery may differ from the same calculations in Dataflow profiles. For more information, see Overview of Visual Profiling.


NOTE: Visual profiling in BigQuery is not supported if data quality rules have been created for the recipe. In this case, full execution of the job is not possible in BigQuery.

Other Databases

Databases that do not support pushdown may support the following optimization settings.

Column pruning from source

When enabled, job execution performance is improved by removing any unused or redundant columns from the source database.

This page has no comments.