Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. D s config
  2. Acquire the value for databricks.sparkVersion.
  3. In Azure Databricks, compare your value to the list of supported Azure Databricks version. If your version is unsupported, identify a new version to use. 

    Info

    NOTE: Please make note of the version of Spark supported for the version of Azure Databricks that you have chosen.


  4. In the 

    D s platform
     configuration, set databricks.sparkVersion to the new version to use.

    Info

    NOTE: The value for spark.version does not apply to Databricks.


  5. Restart the 
    D s platform
    .
  6. The platform is restarted. A new Azure Databricks cluster is created for each user using the specified values, when the user runs a job.

Spark job fails with "spark scheduler cannot be cast" error

When you run a job on Databricks, the job may fail with the following error:

Code Block
java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:616)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

The job.log  file may contain something similar to the following:

Code Block
2022-07-19T15:41:24.832Z - [sid=0cf0cff5-2729-4742-a7b9-4607ca287a98] - [rid=83eb9826-fc3b-4359-8e8f-7fbf77300878] - [Async-Task-9] INFO com.trifacta.databricks.spark.JobHelper - Got error org.apache.spark.SparkException: Stage 0 failed. Error: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, ip-10-243-149-238.eu-west-1.compute.internal, executor driver): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:616)
...

This error is due to a class mismatch between the 

D s platform
and Databricks.

Solution:

The solution is to disable the precedence of using the Spark JARs provided from the 

D s platform
over the Databricks Spark JARs. Please perform the following steps:

  1. D s config
    methodt
  2. Locate the spark.props  section and add the following configuration elements:

    Code Block
    "spark": {
        ...
        "props": {
          "spark.driver.userClassPathFirst": false,
          "spark.executor.userClassPathFirst": false,
          ...
        }
    },


  3. Save your changes and restart the platform.


D s also
inCQLtrue
label((label = "azure") OR (label = "databricks"))