2.1. Configure Manual or Automatic ResourceManager Failover

Prerequisites

Complete the following prerequisites:

Make sure that you have a working Zookeeper service. If you had an Ambari- deployed HDP cluster with Zookeeper, you can use that Zookeeper service. If not, deploy ZooKeeper using the instructions provided in the "Installing ZooKeeper" chapter of the Installing HDP Manually guide.

	Note
In a typical deployment, ZooKeeper daemons are configured to run on three or five nodes. However, it is acceptable to co-locate the ZooKeeper nodes on the same hardware as the HDFS NameNode and Standby Node. Many operators choose to deploy the third ZooKeeper process on the same node as the YARN ResourceManager. To achieve performance and improve isolation, Hortonworks recommends configuring the ZooKeeper nodes so that the ZooKeeper data and HDFS metadata are stored on separate disk drives.

Note

In a typical deployment, ZooKeeper daemons are configured to run on three or five nodes. However, it is acceptable to co-locate the ZooKeeper nodes on the same hardware as the HDFS NameNode and Standby Node. Many operators choose to deploy the third ZooKeeper process on the same node as the YARN ResourceManager. To achieve performance and improve isolation, Hortonworks recommends configuring the ZooKeeper nodes so that the ZooKeeper data and HDFS metadata are stored on separate disk drives.

Shut down the cluster using the instructions provided in "Controlling HDP Services Manually," in the HDP Reference Guide.

Set Common ResourceManager HA Properties

The following properties are required for both manual and automatic ResourceManager HA. Add these properties to the etc/hadoop/conf/yarn-site.xml file:

Property Name	Recommended Value	Description
`yarn.resourcemanager. ha.enabled`	true	Enable RM HA
`yarn.resourcemanager. ha.rm-ids`	Cluster-specific, e.g., rm1,rm2	A comma-separated list of ResourceManager IDs in the cluster.
`yarn.resourcemanager. hostname.<rm-id>`	Cluster-specific	The host name of the ResourceManager. Must be set for all RMs.
`yarn.resourcemanager. recovery.enabled`	true	Enable job recovery on RM restart or failover.
`yarn.resourcemanager. store.class`	`org.apache.hadoop.yarn. server.resourcemanager. recovery.ZKRMStateStore`	The RMStateStore implementation to use to store the ResourceManager's internal state. The ZooKeeper- based store supports fencing implicitly, i.e., allows a single ResourceManager to make multiple changes at a time, and hence is recommended.
`yarn.resourcemanager .zk-address`	Cluster-specific	The ZooKeeper quorum to use to store the ResourceManager's internal state. For multiple ZK servers, use commas to separate multiple ZK servers.
`yarn.client.failover-proxy-provider`	`org.apache.hadoop.yarn. client. ConfiguredRMFailover ProxyProvider`	When HA is enabled, the class to be used by Clients, AMs and NMs to failover to the Active RM. It should extend `org.apache.hadoop.yarn. client.RMFailoverProxyProvider` This is an optional configuration. The default value is “org.apache.hadoop.yarn.client. ConfiguredRMFailoverProxyProvider”

Note

	Note
You can also set values for each of the following properties in `yarn-site.xml`: yarn.resourcemanager.address.<rm‐id> yarn.resourcemanager.scheduler.address.<rm‐id> yarn.resourcemanager.admin.address.<rm‐id> yarn.resourcemanager.resource‐tracker.address.<rm‐id> yarn.resourcemanager.webapp.address.<rm‐id> If these addresses are not explicitly set, each of these properties will use yarn.resourcemanager.hostname.<rm-id>:default_port such as `DEFAULT_RM_PORT`, `DEFAULT_RM_SCHEDULER_PORT`.

You can also set values for each of the following properties in yarn-site.xml:

yarn.resourcemanager.address.<rm‐id>
yarn.resourcemanager.scheduler.address.<rm‐id>
yarn.resourcemanager.admin.address.<rm‐id>
yarn.resourcemanager.resource‐tracker.address.<rm‐id>
yarn.resourcemanager.webapp.address.<rm‐id>

If these addresses are not explicitly set, each of these properties will use

yarn.resourcemanager.hostname.<rm-id>:default_port

such as DEFAULT_RM_PORT, DEFAULT_RM_SCHEDULER_PORT.

The following is a sample yarn-site.xml file with these common ResourceManager HA properties configured:

<!-- RM HA Configurations-->

<property> 
     <name>yarn.resourcemanager.ha.enabled</name> 
     <value>true</value>
</property> 
 
<property> 
     <name>yarn.resourcemanager.ha.rm-ids</name> 
     <value>rm1,rm2</value> 
</property>
 
<property> 
     <name>yarn.resourcemanager.hostname.rm1</name> 
     <value>${rm1 address}</value> 
</property> 
 
<property> 
     <name>yarn.resourcemanager.hostname.rm2</name> 
     <value>${rm2 address}</value> 
</property> 
 
<property> 
     <name>yarn.resourcemanager.webapp.address.rm1</name> 
     <value>rm1_web_address:port_num</value> 
     <description>We can set rm1_web_address separately. If not, it will use 
      ${yarn.resourcemanager.hostname.rm1}:DEFAULT_RM_WEBAPP_PORT</description> 
</property> 
 
<property> 
     <name>yarn.resourcemanager.webapp.address.rm2</name> 
     <value>rm2_web_address:port_num</value> 
</property> 
 
<property> 
     <name>yarn.resourcemanager.recovery.enabled</name> 
     <value>true</value> 
</property> 
 
<property> 
     <name>yarn.resourcemanager.store.class</name> 
     <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> 
</property> 
 
<property> 
     <name>yarn.resourcemanager.zk-address</name>
     <value>${zk1.address,zk2.address}</value> 
</property>
 
<property> 
     <name>yarn.client.failover-proxy-provider</name> 
     <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value> 
</property>

Configure Manual ResourceManager Failover

Automatic ResourceManager failover is enabled by default, so it must be disabled for manual failover.

To configure manual failover for ResourceManager HA, add the yarn.resourcemanager.ha.automatic-failover.enabled configuration property to the etc/hadoop/conf/yarn-site.xml file, and set its value to "false":

<property>
     <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
     <value>false</value>
</property>

Configure Automatic ResourceManager Failover

The preceding section described how to configure manual failover. In that mode, the system will not automatically trigger a failover from the active to the standby ResourceManager, even if the active node fails. This section describes how to configure automatic failover.

Add the following configuration options to the yarn-site.xml file:

Property Name Recommended Value Description

Property Name	Recommended Value	Description
`yarn.resourcemanager.ha. automatic-failover.zk-base-path`	/yarn-leader-election	The base znode path to use for storing leader information, when using ZooKeeper-based leader election. This is an optional configuration. The default value is `/yarn-leader-election`
`yarn.resourcemanager. cluster-id`	yarn-cluster	The name of the cluster. In a HA setting, this is used to ensure the RM participates in leader election for this cluster, and ensures that it does not affect other clusters.

yarn.resourcemanager.ha. automatic-failover.zk-base-path

/yarn-leader-election

The base znode path to use for storing leader information, when using ZooKeeper-based leader election. This is an optional configuration. The default value is

/yarn-leader-election

yarn.resourcemanager. cluster-id

yarn-cluster

The name of the cluster. In a HA setting, this is used to ensure the RM participates in leader election for this cluster, and ensures that it does not affect other clusters.

Example:

<property>
     <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
     <value>/yarn-leader-election</value>
     <description>Optional setting. The default value is /yarn-leader-election</description>
</property>
 
<property>
     <name>yarn.resourcemanager.cluster-id</name>
     <value>yarn-cluster</value>
</property>

Automatic ResourceManager failover is enabled by default.
If you previously configured manual ResourceManager failover by setting the value of yarn.resourcemanager.ha.automatic-failover.enabled to false, you must delete this property to return automatic failover to its default enabled state.

Legal notices