YARN ResourceManager high availability
The ResourceManager high availability (HA) feature adds redundancy in the form of an active-standby ResourceManager pair.
The YARN ResourceManager is responsible for tracking the resources in a
cluster and scheduling applications (for example, MapReduce jobs). The ResourceManager high
availability (HA) feature adds redundancy in the form of an active-standby ResourceManager pair
to remove this single point of failure. Furthermore, upon failover from the active
ResourceManager to the standby, the applications can resume from the last state saved to the
state store; for example, map tasks in a MapReduce job are not run again if a failover to a new
active ResourceManager occurs after the completion of the map phase. This allows events such the
following to be handled without any significant performance effect on running applications:
- Unplanned events such as machine crashes
- Planned maintenance events such as software or hardware upgrades on the machine running the ResourceManager
ResourceManager HA requires ZooKeeper and HDFS services to be running.