Contents:
The Designer Cloud Powered by Trifacta® platform can be configured to integrate with a variety of environments for processing transformation jobs. When you run a job through the application, you have the option of selecting the running environment on which you wish to run the job.
Tip: In general, you should accept the default environment that is presented for job execution. The application attempts to match the scope of your job to the most appropriate running environment.
This section applies to execution of transform jobs. For more information on options for profiling jobs, see Profiling Options.
Available Environments
Photon Running Environment
This running environment is integrated with the application. When enabled, select Run on Alteryx Server.
NOTE: This running environment is enabled by default.
Suitable for small to medium jobs.
Required Installation: None.
Required Configuration: See Configure Photon Running Environment.
Supported Output Formats: CSV, JSON, Avro, Parquet
Notes and Limitations:
NOTE: When a recipe containing a user-defined function is applied to text data, any null characters cause records to be truncated by the running environment during Alteryx Server job execution. In these cases, please execute the job on Hadoop.
Spark Running Environment
This running environment is the new default running environment. The Spark running environment deploys Spark libraries from the Alteryx node to the nodes of the Hadoop cluster. Spark uses in-memory processing for jobs, which limits the read/write operations on each node's hard storage and thereby shortens the time to execute jobs.
Suitable for jobs of all sizes.
Required Installation: None.
Required Configuration: See Configure Spark Running Environment.
Supported Output Formats: CSV, JSON, Avro, Parquet
Notes and Limitations:
NOTE: When executing a job on the Spark running environment using a relational source, the job fails if one or more columns has been dropped from the underlying source table. As a workaround, the recipe panel may show steps referencing the missing columns, which be used to fix to either fix the script or the source data.
JavaScript Running Environment
NOTE: Although you can enable it, this environment is no longer supported. You should enable the Photon running environment in your deployment.
Legacy running environment is no longer enabled by default. When enabled, select Run on Alteryx Server.
Suitable for small jobs.
Required Installation: None.
Required Configuration: For more information on re-enabling this running environment, see Configure Photon Running Environment.
Supported Output Formats: CSV, JSON, Avro
Notes and Limitations:
NOTE: Parquet format cannot be generated in a Alteryx Server environment.
EMR Running Environment
If you have deployed the Designer Cloud Powered by Trifacta platform to integrate with an Amazon EMR cluster, you can run Spark-based jobs on the cluster. This environment is similar to the Hadoop cluster.
Required Installation: None.
Required Configuration: See Configure for EMR.
Supported Output Formats: CSV, JSON, Avro, Parquet
Notes and Limitations:
None.
Configuration
To apply this configuration change, login as an administrator to the Alteryx node. Then, edit trifacta-conf.json
.
For more information, see Platform Configuration Methods.
The following parameters define the available running environments:
"webapp.runInTrifactaServer": true, "webapp.runInHadoop": true, "webapp.runinEMR": false, "webapp.runInDataflow": false, "photon.enabled": true,
For more information on configuring the running environment for EMR, see Configure for EMR.
Below, you can see the configuration settings required to enable each running environment.
- The Spark running environment requires a Hadoop cluster as the backend processing engine.
- In the Run Job page, select Run on Hadoop.
- The Photon and JavaScript running environments execute on the Alteryx node and provide processing to the front-end client and at execution time.
In the Run Job page, select Run on Alteryx Server.
NOTE: To disable execution on the Alteryx Server, set
runInTrifactaServer
tofalse
. Your Alteryx deployment must be connected to a Hadoop running environment.
Type | Running Environment | Configuration Parameters | Notes |
---|---|---|---|
Hadoop Backend | Spark |
| The Spark running environment is the default configuration. |
Client Front-end and non-Hadoop Backend | Photon |
| Photon is the default running environment for the front-end of the application. It is enabled by default. NOTE: To disable use of the legacy JavaScript running environment, you must enable Photon. |
JavaScript |
| The JavaScript running environment should not be enabled unless necessary. NOTE: Although you can enable it, this environment is no longer supported. You should enable the Photon running environment in your deployment. |
NOTE: Do not modify the runInDataflow
setting.
This page has no comments.