When you specify a job in the Run Job page, you may pass to the Spark running environment a set of Spark property values to apply to the execution of the job. These property values override the global Spark settings for your deployment.
Spark overrides are applied to individual output objects.
- You can specify overrides for ad-hoc jobs through the Run Job page.
- You can specify overrides when you configure a scheduled job execution.
In the Run Job page, click the Advanced Execution Settings caret.
|Transformer Dataframe Checkpoint Threshold|
When checkpointing is enabled, the Spark DAG is checkpointed when the approximate number of expressions in this parameter has been added to the DAG. Checkpointing assists in managing the volume of work that is processed through Spark at one time; by checkpointing after a set of steps, Trifacta Wrangler can reduce the chances of execution errors for your jobs.
By raising this number:
|Enable whole-stage code generation for Spark||When enabled, whole-stage code generation optimizes Spark SQL queries for execution performance on the cluster.|
|Maximum number of fields that whole-stage code generation supports|
This defines the number of fields (columns) that are permitted in a whole-stage code generation query. If the number of fields in the query exceeds this value, then Trifacta Wrangler disables whole-stage code generation to prevent performance issues and memory exceptions.
NOTE: Avoid modifying this value unless you have a clear understanding of the implications.
Default value: 100
This page has no comments.