Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Trifacta Wrangler Pro is no longer available. This space will be removed soon. Please visit this page instead: Spark Execution Properties Settings

   

When you specify a job in the Run Job page, you may pass to the Spark running environment a set of Spark property values to apply to the execution of the job. These property values override the global Spark settings for your deployment.



NOTE: A workspace administrator must enable the Custom Spark Options feature in the Workspace Settings page. For more information, see Workspace Settings Page.

Spark overrides are applied to individual output objects. 

  • You can specify overrides for ad-hoc jobs through the Run Job page. 
  • You can specify overrides when you configure a scheduled job execution.

In the Run Job page, click the Advanced Execution Settings caret.


Figure: Spark Execution Properties

Spark parameter

Description

Transformer Dataframe Checkpoint Threshold

When checkpointing is enabled, the Spark DAG is checkpointed when the approximate number of expressions in this parameter has been added to the DAG. Checkpointing assists in managing the volume of work that is processed through Spark at one time; by checkpointing after a set of steps, the

Trifacta platform can reduce the chances of execution errors for your jobs.

By raising this number:

  • You increase the upper limit of steps between checkpoints.
  • You may reduce processing time.
  • It may result in a higher number of job failures.

Default value:200

Enable whole-stage code generation for SparkWhen enabled, whole-stage code generation optimizes Spark SQL queries for execution performance on the cluster.
Maximum number of fields that whole-stage code generation supports

This defines the number of fields (columns) that are permitted in a whole-stage code generation query. If the number of fields in the query exceeds this value, then the Trifacta platform disables whole-stage code generation to prevent performance issues and memory exceptions.

NOTE: Avoid modifying this value unless you have a clear understanding of the implications.

Default value: 100

  • No labels

This page has no comments.