Page tree

Release 9.2


Contents:

   

When Designer Cloud powered by Trifacta Enterprise Edition is installed on a supported version of Cloudera, the Designer Cloud application can be configured to execute larger jobs on the cluster instance of Spark. Spark leverages in-memory capabilities on individual nodes for faster processing of distributed analytics tasks, with spillover to disk as needed.

Tip: In the Run Job page, select Spark to run the job on this running environment when the Designer Cloud application has been integrated with it.


Spark requires a backend distributed storage layer:

  • On AWS-based deployments, this storage layer is S3.
  • On Hadoop-based deployments, this storage layer is HDFS.

Additional configuration is required.

NOTE: When executing a job on the Spark running environment using a relational source, the job fails if one or more columns has been dropped from the underlying source table. As a workaround, the recipe panel may show steps referencing the missing columns, which can be used to fix to either fix the recipe or the source data.

NOTE: The Spark running environment does not support use of multi-character delimiters for CSV outputs. You can switch your job to a different running environment or use single-character delimiters. This issue is fixed in Spark 3.0 and later. For more information on this issue, see https://issues.apache.org/jira/browse/SPARK-24540.

For more information, see Configure for Spark.

This page has no comments.