This section provides information on how to enable and configure the Spark running environment, which leverages Spark's faster in-memory processing to deliver better execution performance.
NOTE: Spark is the default running environment for Hadoop job execution in Release 4.0 and later. Unless you are upgrading from a pre-Release 4.0 environment, no additional configuration is required. |
NOTE: When a recipe containing a user-defined function is applied to text data, any non-printing (control) characters cause records to be truncated by the running environment during Hadoop job execution. In these cases, please execute the job on the |
The Spark execution environment is enabled by default.
NOTE: If you have not done so already, please enable and configure the Spark Job Service. See Configure for Spark. |
NOTE: If you have upgraded from a pre-Release 4.0 system, your running environment may default to the one defined in your previous release. For more information on enabling, see Running Environment Options. |
When Spark execution is enabled, it is available like any other execution environment in the application. When executing a job, select the Run on Hadoop option from the drop-down in the Run Job page. See Run Job Page.
For more information on changing limits and other tuning parameters, see Configure for Spark.