Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

NOTE: If the HttpFS service has been tied to the primary namenode of the cluster and that node fails, this setting must be manually configured to the new node and the platform must be restarted. Avoid tying HttpFS to the primary namenode.

In this example, the active namenode has been set to the nn1 value in the previous configuration:

Excerpt

Enable HA Service

To begin, you must enable the High Availability service in the

D s platform
for the supported components. In platform configuration, each component has its own feature flag under feature.highAvailability.

D s config

In the following example configuration, high availability has been disabled for resourcemanagers and enabled for namenodes: 

Info

NOTE: In almost all cases, feature.highAvailability.resourceManager should be set to false. For more information, see Example - Configure resource manager below.


Code Block
"feature.highAvailability.namenode": true,
"feature.highAvailability.resourceManager": false,

Configure HA for Individual Components

High availability in Hadoop works by specifying a nameservice for a highly available component and then enumerating the hosts and ports as children of that nameservice node. These values must be explicitly specified in the platform configuration.

Warning

Service names and child names should be specified in the file as they appear in the cluster's configuration files.

Example - Configure namenode

In the following example, the nameservice namenodeha provides high availability through two namenodes: nn1 and nn2. In a high availability environment, these hosts are used for submitting jobs and writing data to HDFS.

Code Block
  "hdfs": {
    ...
    "highAvailability": {
      "serviceName": "namenodeha",
      "namenodes": {
        "nn1": {
          "host": "nn1.hadoop.mycompany.org",
          "port": 8020
        },
        "nn2": {
          "host": "nn2.hadoop.mycompany.org",
          "port": 8020
        }
      }
    }
  },

Example - Configure resource manager

Info

NOTE: Set feature.highAvailability.resourcemanager=true only if the cluster file yarn-site.xml enables yarn.resourcemanager.hostname.highlyavailableyarn. This setting enables the cluster high availability for resourcemanager.

Otherwise, set feature.highAvailability.resourcemanager=false for all environments. For HA environments, the resourcemanager hosts specified in the configuration below set the HA servers that are used by the

D s platform
.


The following example specifies two failover nodes for the resource manager: rm1 and rm2.

Code Block
"yarn": {
 "resourceManagers": {
   "rm1": {
     "host": "rm1.yarn.mycompany.org",
     "port": 8032,
     "schedulerPort": 8030,
     "adminPort": 8033,
     "webappPort": 8042
   },
  "rm2": {
     "host": "rm2.yarn.mycompany.org",
     "port": 8032,
     "schedulerPort": 8030,
     "adminPort": 8033,
     "webappPort": 8042
   } 
  }
}

Update Active Namenode

The active namenode used by the service must be configured explicitly. This value must be updated whenever the active namenode changes. Otherwise, HDFS becomes unavailable.

Info
Code Block
"webhdfs": {
 "proxy": { ... },
 "version": "/webhdfs/v1",
 "port": 14000,
 "httpfs": true
},
...
"namenode": {
 "host": "nn1.hadoop.mycompany.org",
 "port": 8020
 }, 
}, 

Configure HA in a Kerberized Environment

If you are enabling high availability in a Kerberized environment, additional configuration is required. 

Info

NOTE: WebHDFS does not support high availability/failover. You must enable HttpFS instead. For more information, see Enable HttpFS.

Steps:

  1. If you have not done so already, acquire httpfs-site.xml from your Hadoop cluster.

  2. Add the following settings to the file, replacing

    D s defaultuser
    Typehadoop
    Fulltrue
    with the value appropriate for your environment:

    Code Block
    <property>
     <name>httpfs.authentication.type</name>
     <value>org.apache.hadoop.security.token.delegation.web.KerberosDelegationTokenAuthenticationHandler</value> 
    </property> 
    <property>
     <name>httpfs.authentication.delegation-token.token-kind</name>
     <value>WEBHDFS delegation</value> 
    </property> 
    <property>
     <name>httpfs.proxyuser.[hadoop.user].hosts</name>
     <value>*</value> 
    </property> 
    <property>
     <name>httpfs.proxyuser.[hadoop.user].groups</name>
     <value>*</value> 
    </property>


  3. The above change must also be applied to the httpfs-site.xml configuration file for the cluster.

Save your changes and restart the platform.

Change Active Namenode

If you must change the active namenode manually, the new namenode must be configured explicitly.

Info

NOTE: This step is not required for a failover event.


Info

NOTE: If the HttpFS service has been tied to the primary namenode of the cluster and that node fails, this setting must be manually configured to the new node and the platform must be restarted. Avoid tying HttpFS to the primary namenode.

In this example, the active namenode has been set to the nn1 value in the previous configuration:

Code Block
"webhdfs": {
 "proxy": { ... },
 "version": "/webhdfs/v1",
 "port": 14000,
 "httpfs": true
},
...
"namenode": {
 "host": "nn1.hadoop.mycompany.org",
 "port": 8020
 }, 
}, 

Platform Restart

When high availability has been enabled, you must restart the platform from the command line. For more information, see Start and Stop the Platform.