By default, the applies configuration to Spark at the global level. All jobs submitted to the connected instance of Spark utilize the same set of Spark properties and settings. As needed, the set of properties can be modified by administrators through the Admin console.
Optionally, flow owners can configure overrides to the default Spark properties at the output object level.
Overrides are defined as part of the output object. When you configure these values, they are applied each time that the output object is used to generate a set of results.
User-specific Spark overrides: If you have enabled user-specific overrides for Spark jobs, those settings take precedence over the settings that are applied through this feature. For more information, see Configure User-Specific Props for Cluster Jobs.
This feature allows administrators to enable the passthrough of properties to Spark, and users can submit any value of an enabled property. Please be careful in choosing the properties that you enable for users to override. |
Property validation:
1000
for the number of spark.executor.cores
, which causes job failures.When this feature is enabled, the following properties are available for users to override at job execution time with their preferred values.
NOTE: These properties are always available for override when the feature is enabled. |
Spark parameter | Description |
---|---|
spark.driver.memory | Amount of RAM in GB on each Spark node that is made available for the Spark drivers. |
spark.executor.memory | Amount of RAM in GB on each Spark node that is made available for the Spark executors. |
spark.executor.cores | Number of cores on each Spark executor that is made available to Spark. |
transformer.dataframe.checkpoint.threshold | When checkpointing is enabled, the Spark DAG is checkpointed when the approximate number of expressions in this parameter has been added to the DAG. Checkpointing assists in managing the volume of work that is processed through Spark at one time; by checkpointing after a set of steps, the By raising this number:
|
Spark jobs on Azure Databricks:
For Spark jobs executed on Azure Databricks, only the following default override parameters are supported:
Spark parameter | Description |
---|---|
transformer.dataframe.checkpoint.threshold | See above. |
During Spark job execution on Azure Databricks:
Whenever overrides are applied to an Azure Databricks cluster, the overrides must be applied at the time of cluster creation. As a result, a new Azure Databricks cluster is spun up for the job execution, which may cause the following:
|
For more details on setting these parameters, see Tune Cluster Performance.
Workspace administrators can enable Spark job overrides.
Steps:
Locate the following parameter:
Enable Custom Spark Options Feature |
Enabled
.After enabling the feature, workspace administrators can define the Spark properties that are available for override.
Steps:
Locate the following parameter:
Spark Whitelist Properties |
Enter a comma-separated list of Spark properties. For example, the entry for adding the following two properties looks like the following:
spark.driver.extraJavaOptions,spark.executor.extraJavaOptions |
Overrides are applied to output objects associated with a flow.
Run Job page
When you configure an on-demand job to run:
The Spark Execution Properties are available for override.
NOTE: No validation of the property values is performed against possible values or the connected running environment. |
For more information, see Spark Execution Properties Settings.
When you are scheduling a job:
The Spark Execution Properties are available for override.
NOTE: No validation of the property values is performed against possible values or the connected running environment. |
For more information, see Spark Execution Properties Settings.
You can submit Spark property overrides as part of the request body for an output object. See API Workflow - Manage Outputs.