This section provides additional configuration requirements for integrating the with the Hortonworks Data Platform.
NOTE: Except as noted, the following configuration items apply to the latest supported version of Hortonworks Data Platform. |
Pre-requisites
Before you begin, it is assumed that you have completed the following tasks:
You have access to platform configuration either via the or through the Admin Settings page.
The following changes need to be applied to Hortonworks cluster configuration files or to configuration areas inside Ambari.
Tip: Ambari is the recommended method for configuring your Hortonworks cluster. |
If you have deployed Ranger in a Kerberized environment, you must verify and complete the following changes in Ambari.
Steps:
hive-site.xml
.For Hortonworks 3.0 and later, the intermediate dataset files that are generated as part of Spark profiling of your job can cause the job to hang when the source is a Hive table. As a precaution, if you are profiling jobs from Hive sources, you should disable the following property on Hortonworks 3.0 and later.
Steps:
spark.props
setting.Insert the following setting:
"transformer.dataframe.cache.reused": "false" |
If you are using S3 as your datastore and have enabled Spark profiling, you must apply the following configuration, which adds the hadoop-aws
JAR and the aws-java-sdk
JAR to the extra class path for Spark.
Steps:
Set the parameter as follows, which is specified for HDP 2.5.3.0, build 37:
spark.driver.extraClassPath=/usr/hdp/2.5.3.0-37/hadoop/hadoop-aws-2.7.3.2.5.3.0-37.jar:/usr/hdp/2.5.3.0-37/hadoop/lib/aws-java-sdk-s3-1.10.6.jar |
If you are using Spark for profiling, you must add environment properties to your cluster configuration. See Configure for Spark.
Set up directory permissionsOn all Hortonworks cluster nodes, verify that the YARN user has access to the YARN working directories:
If you are upgrading from a previous version of Hortonworks, you may need to clear the YARN user cache for the
Configure The following changes need to be applied to the Except as noted, these changes are applied to the following file in the Configure WebHDFS port
Configure Resource Manager portHortonworks uses a custom port number for Resource Manager. You must update the setting for the port number used by Resource Manager.
Save your changes. Configure location of Hadoop bundle JAR
Configure Hive LocationsIf you are enabling an integration with Hive on the Hadoop cluster, there are some distribution-specific parameters that must be set. For more information, see Configure for Hive. |
To apply your changes, restart the platform. See Start and Stop the Platform.
After restart, you should verify operations. For more information, see Verify Operations.