Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Following is the list of parameters that can be reviewed or modified based on your requirements:

Optional Parameters

ParameterDescriptionValue

databricks.awsAttributes.firstOnDemandInstances

Number of initial cluster nodes to be placed on on-demand instances. The remainder is placed on availability instances

Default: 1

databricks.awsAttributes.availability

Availability type used for all subsequent nodes past the firstOnDemandInstances.

Default: SPOT_WITH_FALLBACK

databricks.awsAttributes.availabilityZone

Identifier for the availability zone/datacenter in which the cluster resides. The provided availability zone must be in the same region as the Databricks deployment.


databricks.awsAttributes.spotBidPricePercent

The max price for AWS spot instances, as a percentage of the corresponding instance type's on-demand price. When spot instances are requested for this cluster, only spot instances whose max price percentage matches this field will be considered.

Default: 100

databricks.awsAttributes.ebsVolume

The type of EBS volumes that will be launched with this cluster.

Default: None

databricks.awsAttributes.instanceProfileArn

EC2 instance profile ARN for the cluster nodes. This is only used when AWS credential provider is set to temporary/instance. The instance profile must have previously been added to the Databricks environment by an account administrator.

For more information, see Configure for AWS Authentication.
databricks.clusterMode

Determines the cluster mode for running a Databricks job.

Default: JOB
feature.parameterization.matchLimitOnSampling.databricksSparkMaximum number of parameterized source files that are permitted for matching in a single dataset with parameters.Default: 0
databricks.workerNodeTypeType of node to use for the AWS Databricks Workers/Executors. There are 1 or more Worker nodes per cluster.

Default: i3.xlarge

            


databricks.sparkVersionAWS Databricks runtime version which also references the appropriate version of Spark.

Depending on your version of AWS Databricks, please set this property according to the following:

  • AWS Databricks 10.x: 10.0.x-scala2.12

  • AWS Databricks 9.1 LTS: 9.1.x-scala2.12 
  • AWS Databricks 7.3 LTS: 7.3.x-scala2.12

Please do not use other values.

databricks.minWorkersInitial number of Worker nodes in the cluster, and also the minimum number of Worker nodes that the cluster can scale down to during auto-scale-down.

Minimum value: 1

Increasing this value can increase compute costs.

databricks.maxWorkersMaximum number of Worker nodes the cluster can create during auto scaling.

Minimum value: Not less than databricks.minWorkers.

Increasing this value can increase compute costs.

databricks.poolId

If you have enabled instance pooling in AWS Databricks, you can specify the pool identifier here.


Info

NOTE: If both poolId and poolName are specified, poolId is used first. If that fails to find a matching identifier, then the poolName value is checked.


databricks.poolNameIf you have enabled instance pooling in AWS Databricks, you can specify the pool name here.

See previous.

Tip

Tip: If you specify a poolName value only, then you can use the instance pools with the same poolName available across multiple Databricks workspaces when you create a new cluster.


databricks.driverNodeType

Type of node to use for the AWS Databricks Driver. There is only one Driver node per cluster.

Default: i3.xlarge

For more information, see the sizing guide for Databricks.

Info

NOTE: This property is unused when instance pooling is enabled. For more information, see Configure instance pooling below.


databricks. driverPoolIdIf you have enabled instance pooling in AWS Databricks, you can specify the driver node pool identifier here. For more information, see Configure instance pooling below.


Info

NOTE: If both driverPoolId and driverPoolName are specified, driverPoolId is used first. If that fails to find a matching identifier, then the driverPoolName value is checked.


databricks.driverPoolNameIf you have enabled instance pooling in AWS Databricks, you can specify the driver node pool name here. For more information, see Configure instance pooling below.

See previous.

Tip

Tip: If you specify a driverPoolName value only, then you can use the instance pools with the same driverPoolName available across multiple Databricks workspaces when you create a new cluster.


databricks.logsDestinationDBFS location that cluster logs will be sent to every 5 minutesLeave this value as /trifacta/logs.
databricks.enableAutoterminationSet to true to enable auto-termination of a user cluster after N minutes of idle time, where N is the value of the autoterminationMinutes property.Unless otherwise required, leave this value as true.
databricks.clusterStatePollerDelayInSecondsNumber of seconds to wait between polls for AWS Databricks cluster status when a cluster is starting up
databricks.clusterStartupWaitTimeInMinutesMaximum time in minutes to wait for a Cluster to get to Running state before aborting and failing an AWS Databricks job. Default: 60
databricks.clusterLogSyncWaitTimeInMinutes

Maximum time in minutes to wait for a Cluster to complete syncing its logs to DBFS before giving up on pulling the cluster logs to the

D s node
.

Set this to 0 to disable cluster log pulls.
databricks.clusterLogSyncPollerDelayInSecondsNumber of seconds to wait between polls for a Databricks cluster to sync its logs to DBFS after job completion. Default: 20
databricks.autoterminationMinutesIdle time in minutes before a user cluster will auto-terminate.Do not set this value to less than the cluster startup wait time value.
databricks.maxAPICallRetriesMaximum number of retries to perform in case of 429 error code responseDefault: 5. For more information, see Configure Maximum Retries for REST API section below.
databricks.enableLocalDiskEncryption

Enables encryption of data like shuffle data that is temporarily stored on cluster's local disk.

-
databricks.patCacheTTLInMinutesLifespan in minutes for the Databricks personal access token in-memory cacheDefault: 10
spark.useVendorSparkLibraries

When true, the platform bypasses shipping its installed Spark libraries to the cluster with each job's execution.


Info

NOTE: This setting is ignored. The vendor Spark libraries are always used for AWS Databricks.


Configure Databricks Job Management

...