Configure for Spark Profiling
Disable intermediate caching for Hive profiling jobs
For Hortonworks 3.0 and later, the intermediate dataset files that are generated as part of Spark profiling of your job can cause the job to hang when the source is a Hive table. As a precaution, if you are profiling jobs from Hive sources, you should disable the following property on Hortonworks 3.0 and later.
D s config
- Locate the
Insert the following setting:
- Save your changes and restart the platform.
Additional configuration for Spark profiling on S3
If you are using S3 as your datastore and have enabled Spark profiling, you must apply the following configuration, which adds the
hadoop-aws JAR and the
aws-java-sdk JAR to the extra class path for Spark.
- In Ambari, navigate to Spark2 > Configs.
- Add a new parameter to Custom Spark2-defaults.
Set the parameter as follows, which is specified for HDP 188.8.131.52, build 37:
- Restart Spark from Ambari.
- Restart the
D s platform item node
Additional configuration for Spark profiling
If you are using Spark for profiling, you must add environment properties to your cluster configuration. See Configure for Spark.
Set up directory permissions
On all Hortonworks cluster nodes, verify that the YARN user has access to the YARN working directories:
If you are upgrading from a previous version of Hortonworks, you may need to clear the YARN user cache for the
The following changes need to be applied to the
Except as noted, these changes are applied to the following file in the
Configure WebHDFS port
Configure Resource Manager port
Hortonworks uses a custom port number for Resource Manager. You must update the setting for the port number used by Resource Manager.
Save your changes.
Configure location of Hadoop bundle JAR
Configure Hive Locations
If you are enabling an integration with Hive on the Hadoop cluster, there are some distribution-specific parameters that must be set. For more information, see Configure for Hive.