Configure Manual or Automatic ResourceManager Failover
Prerequisites
Complete the following prerequisites:
Make sure that you have a working ZooKeeper service. If you had an Ambari deployed HDP cluster with ZooKeeper, you can use that ZooKeeper service. If not, deploy ZooKeeper using the instructions provided in the Manual Install Guide.
Note In a typical deployment, ZooKeeper daemons are configured to run on three or five nodes. It is, however, acceptable to co-locate the ZooKeeper nodes on the same hardware as the HDFS NameNode and Standby Node. Many operators choose to deploy the third ZooKeeper process on the same node as the YARN ResourceManager. To achieve performance and improve isolation, Hortonworks recommends configuring the ZooKeeper nodes such that the ZooKeeper data and HDFS metadata is stored on separate disk drives.
Shut down the cluster using the instructions provided in "Controlling HDP Services Manually," in the HDP Reference Guide.
Set Common ResourceManager HA Properties
The following properties are required for both manual and automatic ResourceManager HA. Add these properties to the
etc/hadoop/conf/yarn-site.xml
file:
Property Name | Recommended Value | Description |
---|---|---|
| true | Enable RM HA |
| Cluster-specific, e.g., rm1,rm2 | A comma-separated list of ResourceManager IDs in the cluster. |
| Cluster-specific | The host name of the ResourceManager. Must be set for all RMs. |
| true | Enable job recovery on RM restart or failover. |
|
| The RMStateStore implementation to use to store the ResourceManager's internal state. The ZooKeeper- based store supports fencing implicitly, i.e., allows a single ResourceManager to make multiple changes at a time, and hence is recommended. |
| Cluster-specific | The ZooKeeper quorum to use to store the ResourceManager's internal state. For multiple ZK servers, use commas to separate multiple ZK servers. |
|
| When HA is enabled, the class to be used by Clients, AMs and NMs to failover to the Active RM. It should extend
This is an optional configuration. The default value is “org.apache.hadoop.yarn.client. ConfiguredRMFailoverProxyProvider” |
Note | |
---|---|
You can also set values for each of the following properties in yarn.resourcemanager.address.<rm‐id> yarn.resourcemanager.scheduler.address.<rm‐id> yarn.resourcemanager.admin.address.<rm‐id> yarn.resourcemanager.resource‐tracker.address.<rm‐id> yarn.resourcemanager.webapp.address.<rm‐id> If these addresses are not explicitly set, each of these properties will use yarn.resourcemanager.hostname.<rm-id>:default_port such
as |
The following is a sample yarn-site.xml file
with these common
ResourceManager HA properties configured:
<!-- RM HA Configurations--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>${rm1 address}</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>${rm2 address}</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>rm1_web_address:port_num</value> <description>We can set rm1_web_address separately. If not, it will use ${yarn.resourcemanager.hostname.rm1}:DEFAULT_RM_WEBAPP_PORT </description> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>rm2_web_address:port_num</value> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery. ZKRMStateStore</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>${zk1.address,zk2.address}</value> </property> <property> <name>yarn.client.failover-proxy-provider</name> <value>org.apache.hadoop.yarn.client. ConfiguredRMFailoverProxyProvider</value> </property>
Configure Manual ResourceManager Failover
Automatic ResourceManager failover is enabled by default, so it must be disabled for manual failover.
To configure manual failover for ResourceManager HA, add the
yarn.resourcemanager.ha.automatic-failover.enabled configuration property to the
etc/hadoop/conf/yarn-site.xml
file, and set its value to
"false":
<property> <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> <value>false</value> </property>
Configure Automatic ResourceManager Failover
The preceding section described how to configure manual failover. In that mode, the system will not automatically trigger a failover from the active to the standby ResourceManager, even if the active node has failed. This section describes how to configure automatic failover.
Add the following configuration options to the
yarn-site.xml
file:Property Name Recommended Value Description yarn.resourcemanager.ha. automatic-failover.zk-base-path
/yarn-leader-election
The base znode path to use for storing leader information, when using ZooKeeper-based leader election. This is an optional configuration. The default value is
/yarn-leader-election
yarn.resourcemanager. cluster-id
yarn-cluster
The name of the cluster. In a HA setting, this is used to ensure the RM participates in leader election for this cluster, and ensures that it does not affect other clusters.
Example:
<property> <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name> <value>/yarn-leader-election</value> <description>Optional setting. The default value is /yarn-leader-election</description> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn-cluster</value> </property>
Automatic ResourceManager failover is enabled by default.
If you previously configured manual ResourceManager failover by setting the value of
yarn.resourcemanager.ha.automatic-failover.enabled
to "false", you must delete this property to return automatic failover to its default enabled state.