Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r097

D toc

D s ed
nottrue
editionsgdple

Excerpt

In the Flow Optimization Settings dialog, you can configure the following settings, which provide finer-grained control and performance tuning over your flow and its job executions. From the Flow View menu, select Optimization settings.

This feature must be enabled at the workspace level. When enabled, the settings in this dialog are applied to the current flow.

These optimizations are designed to improve performance by pre-filtering the volume of data by reducing the columns and rows to the ones that are actually used.

Tip

Tip: In general, all of these optimizations should be enabled for each flow. As needed, you can selectively disable optimizations if you are troubleshooting execution issues.


When these filters are enabled, the number of filters successfully applied to a job execution is listed in the Optimization summary in the Job Details page. See Job Details Page.

Enable optimization for jobs from this flow

When enabled, the 

D s webapp
 attempts to apply any of the listed optimizations that are enabled to jobs that are executed for this flow.

Info

NOTEWhen this option is disabled, then no optimization settings are available. 

General Optimizations

The following optimizations can be enabled or disabled in general. For individual data sources, you may be able to enable or disable these settings based on your environment and its requirements .

Tip

Tip: These optimizations are applied at the recipe level. They can be applied on any flow and may improve performance within the Transformer page.

Column pruning optimization

When enabled, job execution performance is improved by removing any unused or redundant columns based on the recipe that is selected.

Filter optimization

When this setting is enabled, the 

D s webapp
 optimizes job performance on this flow by pushing data filters to recipes.

Full execution for 
D s storage
files

When enabled, jobs for this flow that are sourced from files stored in

D s storage
can be executed in BigQuery.

Info

NOTE: Additional limitations and requirements may apply for file-based job execution.

For more information, see BigQuery Running Environment.

Other optimizations

Additional optimizations can be enabled or disabled for specific types of transformations or jobs. 

File-Based Optimizations

Databases that Support Pushdown

Individual types of databases may support one or more of the following pushdowns. Additional restrictions may apply for your specific database.

Tip

Tip: These optimizations are applied to queries of your relational datasources that support pushdown. These optimizations are applied within the source, which limits the volume of data that is transferred during job execution.

Info

NOTE: For each relational connection, you can enable the optimization capabilities to improve the flow and its job execution performance. The optimization settings may vary based on the type of relational connections.

Column pruning from source

When enabled, job execution performance is improved by removing any unused or redundant columns from the source database.

Limitations:

  • Column pruning optimizations cannot be applied to imported datasets generated with custom SQL.

Filter pushdown

When this setting is enabled, the 

D s webapp
 optimizes job performance on this flow by pushing data filters directly on the source database.

Limitations:

  • Filter pushdown optimizations cannot be applied to imported datasets generated with custom SQL.
  • Pushdown filters cannot be applied to dates in your relational sources.
Info

NOTE: SQL-based filtering is performed on a best-effort basis. When these optimizations are enabled for your flow, there is no guarantee that they will be applied during job execution.


Info

NOTE: The connection types may or may not be available in your product edition. For more information, see Connection Types.

Sample pushdown

When this setting is enabled, the

D s webapp
optimizes job performance by executing sampling jobs directly on the source database.

Info

NOTE: All pushdowns must be enabled to ensure sample jobs run in the database.

Limitations for sampling on BigQuery:

  • The following sampling types are not supported for execution in BigQuery:
    • Initial rows
    • Stratified sampling
    • Cluster-based sampling
  • Quick scan sampling jobs are executed on

    D s photon
    , which cannot be pushed down to BigQuery.

    Info

    NOTE:

    D s photon
    running environment may not be enabled in your project. A workspace administrator can enable it, if needed. For more information, see Dataprep Project Settings Page.


  • Full execution of sampling jobs is limited only to BigQuery data sources.

  • If the calculated number of rows to include in the sample is not possible, the sample is limited to 10,000 rows as a maximum.
  • Datasources hosted in

    D s storage
    are not pushed down to BigQuery for sampling.

Limitations for sampling on

D s conntype
typesnowflake
:

  • The following sampling types are not supported for execution in
    D s conntype
    typesnowflake
    :
    • Initial rows
    • Stratified sampling
    • Cluster-based sampling
  • Full execution of sampling jobs is limited only to
    D s conntype
    typesnowflake
    data sources.

  • If the calculated number of rows to include in the sample is not possible, the sample is limited to 10,000 rows as a maximum.
  • Datasources hosted in

    D s storage
    are not pushed down to
    D s conntype
    typesnowflake
    for sampling.

Other Databases

Databases that do not support pushdown may support the following optimization settings.

Column pruning from source

When enabled, job execution performance is improved by removing any unused or redundant columns from the source database.

D s also
inCQLtrue
label((label = "flow_ui") or (label = "pushdown"))