Configure Photon Running Environment
The Trifacta Application can connect to a high-performance environment embedded in the Trifacta node for execution of jobs against small- to medium-sized datasets, called the Trifacta Photon running environment.
The Trifacta Photon running environment can be selected in the Run Job page.
By default, the Trifacta Photon running environment is enabled for new installations.
Features:
Faster execution times for transform and profiling jobs
Better consistency with typecasting done in Spark jobs
This section provides information on how to enable and configure the Trifacta Photon running environment.
Note
Some configuration is shared with the Trifacta Photon client. For more information, see Configure Photon Client.
Limitations
Note
For profiles executed in the Trifacta Photon running environment, percentages for valid, missing, or mismatched column values may not add up to 100% due to rounding. See Overview of Visual Profiling.
Disable Trifacta Photon Running Environment
The Trifacta Photon running environment is enabled by default. Please complete the following configuration to disable the running environment.
Note
A cluster-based running environment, such as Spark, must be available for processing jobs when this one is disabled.
Note
When Trifacta Photon is disabled, quick scan sampling is not available.
Steps:
You can apply this change through the Admin Settings Page (recommended) or
trifacta-conf.json
. For more information, see Platform Configuration Methods.To disable the Trifacta Photon running environment, locate the following setting, and set it to
Disabled
:Photon execution
You can apply this change through the Admin Settings Page (recommended) or
trifacta-conf.json
. For more information, see Platform Configuration Methods.Apply the following configuration settings:
"feature.enableSamplingScanOptions": false, "feature.enableFirstRowsSample": false,
Save your changes and restart the platform.
Example Configuration
The following configuration includes the default values.
"photon": { "cacheEnabled": true, "numThreads": 4, "distroPath": "/photon/dist/centos6/photon", "traceExecution": false, "websocket": { "host": "localhost", "port": 8082 }, "mode": "wasm" },
Some of these values apply to the Trifacta Photon client. For more information, see Configure Photon Client.
Parameter | Description |
---|---|
cacheEnabled | Debugging setting. Leave the default value. |
numThreads | Maximum number of threads permitted to the Trifacta Photon process. For recommended values, see Configure Photon Client. |
distroPath | Please verify that this property is set to the following value, which works for all operating system distributions: "distroPath": "/photon/dist/centos6/photon", |
traceExecution | Debugging setting. Leave the default value. |
websocket.host | Internal parameter. Do not modify. |
websocket.port | Internal parameter. Do not modify. |
mode | Set this value is |
Modify Limits
Runtime job timeout
By default, the Designer Cloud Powered by Trifacta platform imposes no limit on execution of a Trifacta Photon job. As needed, you can enable and configure a limit.
Steps:
You can apply this change through the Admin Settings Page (recommended) or
trifacta-conf.json
. For more information, see Platform Configuration Methods."batchserver.workers.photon.timeoutEnabled": false, "batchserver.workers.photon.timeoutMinutes": 180,
Setting
Description
timeoutEnabled
Set to
false
to disable job limiting. Set totrue
to enable the timeout specified below.timeoutMinutes
Defines the number of minutes that a Trifacta Photon job is permitted to run. Default value is
180
(three hours).Save your changes and restart the platform.
When a job has failed due to exceeding a timeout, additional information is available in the job logs. The following is a good search term for this type of error:
java.lang.Exception: Photon job '<jobId>' timeout
where <jobId>
is the internal job identifier.
Job logs can be downloaded from the Job page. See Job History Page.
Trifacta Photon running environment memory timeout
To prevent crashes, the Trifacta Photon running environment imposes a memory consumption limit for each job. If this memory timeout is exceeded, the job is automatically killed. As needed, you can disable this memory protection (not recommended) or change the memory threshold when jobs are killed.
Steps:
You can apply this change through the Admin Settings Page (recommended) or
trifacta-conf.json
. For more information, see Platform Configuration Methods.Locate the following setting:
"batchserver.workers.photon.memoryMonitorEnabled": false,
Setting
Description
memoryMonitorEnabled
Set to
false
to disable memory monitoring. Set totrue
to enable the threshold specified below.Save your changes and restart the platform.
Additional information is available in the job logs. The following is a good search term for this type of error:
java.lang.Exception: Photon job '<jobId>' failed with memory consumption over threshold
where <jobId>
is the internal job identifier.
Below this line item, you may see the following entries, which can provide additional information to adjust the memory settings:
2017-05-04T02:26:40.549Z [job-id 740] com.trifacta.joblaunch.util.ProcessMonitorUtil [Thread-20] INFO com.trifacta.joblaunch.util.ProcessMonitorUtil - Global memory size: 8373186560 bytes 2017-05-04T02:26:40.555Z [job-id 740] com.trifacta.joblaunch.util.ProcessMonitorUtil [Thread-20] INFO com.trifacta.joblaunch.util.ProcessMonitorUtil - Available global memory size at process start: 2672959488 bytes ... 2017-05-04T02:29:15.690Z [job-id 740] com.trifacta.joblaunch.util.ProcessMonitorUtil [Thread-20] INFO com.trifacta.joblaunch.util.ProcessMonitorUtil - Current memory consumption: 5.614080429077148% 2017-05-04T02:29:15.691Z [job-id 740] com.trifacta.joblaunch.util.ProcessMonitorUtil [Thread-20] ERROR com.trifacta.joblaunch.util.ProcessMonitorUtil - Average memory consumption for the past 15 seconds over 5% threshold: 5.174326801300049 %. Current available global memory: 2244628480 bytes
Item | Description |
---|---|
Global memory size | Total available global memory in bytes |
Available global memory size at process start | Total available memory in bytes when the job is launched |
Current memory consumption | Current memory usage for the job process as a percentage of the total. This metric is posted to the log every 30 seconds and can be used to debug memory leaks. |
Average memory consumption for the past 15 seconds over x% threshold | When the job fails due to the memory threshold, this metric identifies the average memory consumption percentage over the past 15 seconds. The defined threshold percentage is included. |
Current available global memory | When the job fails, this metric identifies the total available memory at the time of failure. |
Job logs can be downloaded from the Job page. See Job History Page.
Batch FileSystem Access Timeout Settings
The default timeout settings for reading and writing of data from the client browser through Trifacta Photon running environment to the Trifacta node should work in most cases.
Particularly when reading from large tables, you might discover errors similar to the following:
06:21:21.365 [Job 23] INFO com.trifacta.hadoopdata.photon.BatchPhotonRunner - terminating with uncaught exception of type Poco::TimeoutException: Timeout 06:21:21.375 [Job 23] INFO com.trifacta.hadoopdata.photon.BatchPhotonRunner - /vagrant/photon/dist/centos6/photon/bin/photon-cli: line 22: 15639 Aborted $ Unknown macro: {command[@]}
Steps:
You can apply this change through the Admin Settings Page (recommended) or
trifacta-conf.json
. For more information, see Platform Configuration Methods.Locate the
photon.extraCliArgs
node.Add the following values to the
extraCliArgs
entry:"photon.extraCliArgs" : "-batch_vfs_read_timeout <300> -batch_vfs_write_timeout <300>"
Argument
Description
-batch_vfs_read_timeout
Timeout limit in seconds of read operations from the datastore. Default value is
300
seconds (5 minutes).Tip
Raising the value to
3600
seconds should be fine in most environments. Avoid setting this value above7200
seconds (2 hours).-batch_vfs_write_timeout
Timeout limit in seconds of write operations to the datastore. Default value is
300
seconds (5 minutes).Note
Do not modify unless specifically instructed by Alteryx Support.
To reduce timeouts, raise the above settings.
Save your changes and restart the platform.
Dynamic chunk size for Parquet reads
In some environments, quick scan sampling and data preview of Parquet files may have performance issues. To improve performance in these areas, you can enable reading of chunks of dynamic size from Parquet. When enabled, the Trifacta Application adjusts the size of each read chunk based on the Row Group Size of the Parquet file. These chunk sizes are calculated for each file and can vary between 1 MB and 20 MB.
Steps:
You can apply this change through the Admin Settings Page (recommended) or
trifacta-conf.json
. For more information, see Platform Configuration Methods.Locate the following parameter and set it to
true
:"photon.useDynamicChunkSizeForParquet": true,
Save your changes and restart the platform.
Tuning Photon
For more information on tuning the performance of Photon, see Tune Application Performance.
Configure VFS Service
The Trifacta Photon running environment interacts with backend datastores through the VFS service.
Note
The VFS service does not often need non-default configuration.
For more information, see Configure VFS Service.
Use Trifacta Photon Running Environment
When executing a job, select the Photon option.
Note
Before you test, please be sure to complete all steps of Required Platform Configuration.