Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 
ParameterDescriptionValue
feature.parameterization.maxNumberOfFilesForExecution.databricksSparkMaximum number of parameterized source files that are permitted to be executed as part of an Azure Databricks job. 
feature.parameterization.matchLimitOnSampling.databricksSparkMaximum number of parameterized source files that are permitted for matching in a single dataset with parameters.
databricks.workerNodeTypeType of node to use for the Azure Databricks Workers/Executors. There are 1 or more Worker nodes per cluster.

Default: Standard_D3_v2

For more information, see the sizing guide for Azure Databricks.

databricks.sparkVersionAzure Databricks cluster version which also includes the Spark Version.

Please do not change unless you are using Spark 2.4.

Info

NOTE: If spark.version is set to 2.4.0, then this value must be set to 5.2.x-scala2.11.

For more information, see Configure for Spark.

databricks.serviceUrlURL to the Azure Databricks Service where Spark jobs will be run (Example: https://westus2.azuredatabricks.net) 
databricks.minWorkersInitial number of Worker nodes in the cluster, and also the minimum number of Worker nodes that the cluster can scale down to during auto-scale-down

Minimum value: 1

Increasing this value can increase compute costs.

databricks.maxWorkersMaximum number of Worker nodes the cluster can create during auto scaling

Minimum value: Not less than databricks.minWorkers.

Increasing this value can increase compute costs.

databricks.logsDestinationDBFS location that cluster logs will be sent to every 5 minutesLeave this value as /trifacta/logs.
databricks.enableAutoterminationSet to true to enable auto-termination of a user cluster after N minutes of idle time, where N is the value of the autoterminationMinutes property.Unless otherwise required, leave this value as true.
databricks.driverNodeTypeType of node to use for the Azure Databricks Driver. There is only 1 Driver node per cluster.

Default: Standard_D3_v2

For more information, see the sizing guide for Databricks.

databricks.clusterStatePollerDelayInSecondsNumber of seconds to wait between polls for Azure Databricks cluster status when a cluster is starting up 
databricks.clusterStartupWaitTimeInMinutesMaximum time in minutes to wait for a Cluster to get to Running state before aborting and failing an Azure Databricks job 
databricks.clusterLogSyncWaitTimeInMinutes

Maximum time in minutes to wait for a Cluster to complete syncing its logs to DBFS before giving up on pulling the cluster logs to the

D s node
.

Set this to 0 to disable cluster log pulls.
databricks.clusterLogSyncPollerDelayInSecondsNumber of seconds to wait between polls for a Databricks cluster to sync its logs to DBFS after job completion 
databricks.autoterminationMinutesIdle time in minutes before a user cluster will auto-terminate.Do not set this value to less than the cluster startup wait time value.
spark.useVendorSparkLibraries

When true, the platform bypasses shipping its installed Spark libraries to the cluster with each job's execution.

Default is false.

Do not modify unless you are experiencing failures in Azure Databricks job execution. For more information, see Troubleshooting below

Info

NOTE: Set this value to true.


Configure Personal Access Token

...