The  can be configured to integrate with a variety of environments for processing transformation jobs. When you run a job through the application, you have the option of selecting the running environment on which you wish to run the job. 

Tip: In general, you should accept the default environment that is presented for job execution. The application attempts to match the scope of your job to the most appropriate running environment.

This section applies to execution of transform jobs. For more information on options for profiling jobs, see Profiling Options.

Available Environments

Photon Running Environment

This running environment is integrated with the application. When enabled, select Run on .

NOTE: This running environment is enabled by default.

 

Suitable for small to medium jobs.

Required Installation: None.

Required Configuration: See Configure Photon Running Environment.

Supported Output Formats: CSV, JSON, Avro, Parquet

Notes and Limitations:

NOTE: If the Photon running environment is disabled, please set feature.enableFirstRowsSample to false. This sampling technique requires the Photon running environment.


NOTE: When a recipe containing a user-defined function is applied to text data, any null characters cause records to be truncated by the running environment during job execution. In these cases, please execute the job on Hadoop.

Spark Running Environment

This running environment is the new default running environment. The Spark running environment deploys Spark libraries from the  to the nodes of the Hadoop cluster. Spark uses in-memory processing for jobs, which limits the read/write operations on each node's hard storage and thereby shortens the time to execute jobs.

Suitable for jobs of all sizes.

Required Installation: None.

Required Configuration: See Configure Spark Running Environment.

Supported Output Formats: CSV, JSON, Avro, Parquet

Notes and Limitations:

NOTE: When executing a job on the Spark running environment using a relational source, the job fails if one or more columns has been dropped from the underlying source table. As a workaround, the recipe panel may show steps referencing the missing columns, which be used to fix to either fix the script or the source data.

NOTE: The Spark running environment does not support use of multi-character delimiters for CSV outputs. You can switch your job to a different running environment or use single-character delimiters. For more information on this issue, see https://issues.apache.org/jira/browse/SPARK-24540.

JavaScript Running Environment

NOTE: Although you can enable it, this environment is no longer supported. You should enable the Photon running environment in your deployment.

Legacy running environment is no longer enabled by default. When enabled, select Run on .

Suitable for small jobs.

Required Installation: None.

Required Configuration: For more information on re-enabling this running environment, see Configure Photon Running Environment.

Supported Output Formats: CSV, JSON, Avro

Notes and Limitations:

NOTE: Parquet format cannot be generated in a environment.

EMR Running Environment

If you have deployed the  to integrate with an Amazon EMR cluster, you can run Spark-based jobs on the cluster. This environment is similar to the Hadoop cluster. 

Required Installation: None.

Required Configuration: See Configure for EMR.

Supported Output Formats: CSV, JSON, Avro, Parquet

Notes and Limitations:

NOTE: Job cancellation is not supported on an EMR cluster.

Configure

Available Running Environments

The following parameters define the available running environments:

"webapp.runInTrifactaServer": true,
"webapp.runInHadoop": true,
"webapp.runinEMR": false,
"webapp.runInDataflow": false,
"photon.enabled": true,

For more information on configuring the running environment for EMR, see Configure for EMR.

Below, you can see the configuration settings required to enable each running environment.

TypeRunning EnvironmentConfiguration ParametersNotes
Hadoop BackendSpark

webapp.runInHadoop = true

The Spark running environment is the default configuration. 
Client Front-end and non-Hadoop BackendPhoton

webapp.runInTrifactaServer = true

photon.enabled = true

Photon is the default running environment for the front-end of the application. It is enabled by default.

NOTE: To disable use of the legacy JavaScript running environment, you must enable Photon.

JavaScript

webapp.runInTrifactaServer = true

photon.enabled = false

The JavaScript running environment should not be enabled unless necessary.

NOTE: Although you can enable it, this environment is no longer supported. You should enable the Photon running environment in your deployment.

NOTE: Do not modify the runInDataflow setting.

Configure Default Running Environment

When you specify a job, the default running environment is pre-configured for you, based on the following parameter:

NOTE: If your environment has no running environment such as Hadoop for running large-scale jobs, this parameter is not used. All jobs are run on the .

"webapp.client.maxExecutionBytes.photon": 1000000000,

The default environment presented to you is based on the size of the primary datasource. For the above setting of 1 GB:

Running EnvironmentDefault Condition

Size of primary datasource is less than or equal to the above value in bytes.
Cluster-based running environmentSize of primary datasource is greater than the above value in bytes.

NOTE: This setting defines only the environment that is recommended to you as a predefined selection. If a second running environment, you can choose to select it, although it is not recommended to choose an environment other than the default. See Run Job Page.

Setting this value too high forces more jobs onto the , which may cause slow performance and can potentially overwhelm the server.

Tip: To force the default setting to always be a Hadoop or bulk running environment, set this value to 0. All users are recommended to use the bulk option instead of the . However, smaller jobs may take longer than expected to execute.