A visual profile can be optionally generated as part of execution of a job. While profiling can extend the time it takes to complete the job, a visual profile can provide key statistical information on the columns of your data and the overall dataset itself. This section provide information on each of the available profilers and how to configure them.
NOTE: To enable use of a profiler, the running environment on which it is hosted must also be enabled. See Running Environment Options. |
The supports the following options for profiling your data.
Type of Profiler | Requires Hadoop cluster? | Supported Running Environment | Description | |
---|---|---|---|---|
Scala Spark Profiler | Yes | all | Generates a profile on the
See Configure for Spark. | |
Photon Profiler | No | Photon | Default profiler when Photon running environment is enabled, and the Scala Spark Profiler has not been enabled. |
Making changes to your profiling type:
By default, profiling jobs execute on the running environment where the transformation job was executed. You can configure the (Photon) running environment to execute profiling jobs as a separate job in the Spark running environment. This separation allows users to begin working with the transformed data while waiting for the profiling job to complete.
Steps:
Locate the following property and set its value to true
:
"photon.runProfileWithSpark": false, |
When profiling is selected for jobs executing on the , their jobs are executed on the Spark running environment as a separate job.
Profiling is invoked at job execution time by the user. See Run Job Page.
To disable user choice through the UI, set the following parameter:
"profiler.userOption": false, |
When the above setting is disabled:
NOTE: This setting does not affect profiling through the APIs. Profiling can always be enabled or disabled for jobs that are executed via API. |
To disable user choice through the UI, set the following parameter:
"profiler.userOption": false, |
NOTE: Profiling can still be enabled for jobs that are executed via API. |