Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0821

...

Supported versions:

  • Azure Databricks 8.3

  • Azure Databricks 7.3 (Recommended)

  • Azure Databricks 6.x
  • Azure Databricks 5.5 LTS 

...

ParameterDescriptionValue
databricks.clusterModeDetermines the cluster mode for running a Databricks job.Default: USER
feature.parameterization.matchLimitOnSampling.databricksSparkMaximum number of parameterized source files that are permitted for matching in a single dataset with parameters. 
databricks.workerNodeTypeType of node to use for the Azure Databricks Workers/Executors. There are 1 or more Worker nodes per cluster.

Default: Standard_D3_v2

Info

NOTE: This property is unused when instance pooling is enabled. For more information, see Configure instance pooling below.

For more information, see the sizing guide for Azure Databricks.

databricks.sparkVersionAzure Databricks cluster version which also includes the version of Spark.

Depending on your version of Azure Databricks, please set this property according to the following:

  • Azure Databricks 8.3: 8.3.x-scala2.12

    Info

    NOTE: Except for the above version, Azure Databricks 8.x is not supported.

  • Azure Databricks 7.3: 7.3.x-scala2.12

    Info

    NOTE: Except for the above version, Azure Databricks 7.x is not supported.

  • Azure Databricks 6.x: Please use the default value for your Azure Databricks distribution.
  • Azure Databricks 5.5 LTR: 5.5.x-scala2.11

Please do not use other values.

databricks.serviceUrlURL to the Azure Databricks Service where Spark jobs will be run (Example: https://westus2.azuredatabricks.net) 
databricks.minWorkersInitial number of Worker nodes in the cluster, and also the minimum number of Worker nodes that the cluster can scale down to during auto-scale-down

Minimum value: 1

Increasing this value can increase compute costs.

databricks.maxWorkersMaximum number of Worker nodes the cluster can create during auto scaling

Minimum value: Not less than databricks.minWorkers.

Increasing this value can increase compute costs.

databricks.poolId

If you have enabled instance pooling in Azure Databricks, you can specify the worker node pool identifier here. For more information, see Configure instance pooling below.

Info

NOTE: If both poolId and poolName are specified, poolId is used first. If that fails to find a matching identifier, then the poolName value is checked.

databricks.poolNameIf you have enabled instance pooling in Azure Databricks, you can specify the worker node pool name here. For more information, see Configure instance pooling below.

See previous.

Tip

Tip: If you specify a poolName value only, then you can use the instance pools with the same poolName available across multiple Databricks workspaces when you create a new cluster.

databricks.driverNodeType

Type of node to use for the Azure Databricks Driver. There is only 1 Driver node per cluster.

Default: Standard_D3_v2

For more information, see the sizing guide for Databricks.

Info

NOTE: This property is unused when instance pooling is enabled. For more information, see Configure instance pooling below.

databricks.driverPoolIdIf you have enabled instance pooling in Azure Databricks, you can specify the driver node pool identifier here. For more information, see Configure instance pooling below.
Info

NOTE: If both driverPoolId and driverPoolName are specified, driverPoolId is used first. If that fails to find a matching identifier, then the driverPoolName value is checked.

databricks.driverPoolNameIf you have enabled instance pooling in Azure Databricks, you can specify the driver node pool name here. For more information, see Configure instance pooling below.

See previous.

Tip

Tip: If you specify a driverPoolName value only, then you can use the instance pools with the same driverPoolName available across multiple Databricks workspaces when you create a new cluster.

databricks.logsDestinationDBFS location that cluster logs will be sent to every 5 minutesLeave this value as /trifacta/logs.
databricks.enableAutoterminationSet to true to enable auto-termination of a user cluster after N minutes of idle time, where N is the value of the autoterminationMinutes property.Unless otherwise required, leave this value as true.
databricks.clusterStatePollerDelayInSecondsNumber of seconds to wait between polls for Azure Databricks cluster status when a cluster is starting up 
databricks.clusterStartupWaitTimeInMinutesMaximum time in minutes to wait for a Cluster to get to Running state before aborting and failing an Azure Databricks job 
databricks.clusterLogSyncWaitTimeInMinutes

Maximum time in minutes to wait for a Cluster to complete syncing its logs to DBFS before giving up on pulling the cluster logs to the

D s node
.

Set this to 0 to disable cluster log pulls.
databricks.clusterLogSyncPollerDelayInSecondsNumber of seconds to wait between polls for a Databricks cluster to sync its logs to DBFS after job completion 
databricks.autoterminationMinutesIdle time in minutes before a user cluster will auto-terminate.Do not set this value to less than the cluster startup wait time value.
databricks.enableLocalDiskEncryptionEnables encryption of data like shuffle data that is temporarily stored on cluster's local disk.-
databricks.patCacheTTLInMinutesLifespan in minutes for the Databricks personal access token in-memory cacheDefault: 10
databricks.maxAPICallRetriesMaximum number of retries to perform in case of 429 error code responseDefault: 5. For more information, see Configure Maximum Retries for REST API section below.
spark.useVendorSparkLibraries

When true, the platform bypasses shipping its installed Spark libraries to the cluster with each job's execution.

Info

NOTE: This setting is ignored. The vendor Spark libraries are always used for Azure Databricks.

...