Enable HA Service
To begin, you must enable the High Availability service in the for the supported components. In platform configuration, each component has its own feature flag under
In the following example configuration, high availability has been disabled for resourcemanagers and enabled for namenodes:
Configure HA for Individual Components
High availability in Hadoop works by specifying a nameservice for a highly available component and then enumerating the hosts and ports as children of that nameservice node. These values must be explicitly specified in the platform configuration.
Service names and child names should be specified in the file as they appear in the cluster's configuration files.
Example - Configure namenode
In the following example, the nameservice
namenodeha provides high availability through two namenodes:
nn2. In a high availability environment, these hosts are used for submitting jobs and writing data to HDFS.
Example - Configure resource manager
feature.highAvailability.resourcemanager=true only if the cluster file
yarn.resourcemanager.hostname.highlyavailableyarn. This setting enables the cluster high availability for resourcemanager.
feature.highAvailability.resourcemanager=false for all environments. For HA environments, the resourcemanager hosts specified in the configuration below set the HA servers that are used by the .
The following example specifies two failover nodes for the resource manager:
Configure HA in a Kerberized Environment
If you are enabling high availability in a Kerberized environment, additional configuration is required.
NOTE: WebHDFS does not support high availability/failover. You must enable HttpFS instead. For more information, see Enable HttpFS.
If you have not done so already, acquire
httpfs-site.xml from your Hadoop cluster.
Add the following settings to the file, replacing with the value appropriate for your environment:
The above change must also be applied to the
httpfs-site.xml configuration file for the cluster.