The YARN Resource Manager is the master that arbitrates the available cluster resources, and thus helps manage the distributed applications running on the YARN system. It works together with the per-node Node Managers and the per-application Application Masters.
In YARN, the Resource Manager is primarily limited to scheduling, i.e., only arbitrating available resources in the system among the competing applications, and not concerning itself with per-application state management. The scheduler only handles an overall resource profile for each application, ignoring local optimizations and internal application flow. In fact, YARN completely departs from the static assignment of map and reduce slots because it treats the cluster as a resource pool. Because of this clear separation of responsibilities, the Resource Manager is able to address the important design requirements of scalability and support for alternate programming models.
The Resource Manager has a pluggable Scheduler component, which is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues, etc. The Scheduler is a pure scheduler in the sense that it performs no monitoring or tracking of status for the application, offering no guarantees on restarting failed tasks either due to application or hardware failures. The scheduler performs its scheduling function based on the resource requirements of an application by using the abstract notion of a resource Container, which incorporates resource dimensions such as memory, CPU, disk, network, etc. For more information about the Scheduler, see the Capacity Scheduler guide.
In contrast to many other workflow schedulers, the Resource Manager also has the ability to symmetrically request resources back from a running application. This situation typically happens when cluster resources become scarce and the scheduler decides to revoke some (but not all) of the resources that were given to an application.
In YARN, Resource Requests can be strict or negotiable. This features provides Application Masters with a great deal of flexibility on how to fulfill the requests, e.g., by picking Containers to yield back that are less crucial for the computation, by check-pointing the state of a task, or by migrating the computation to other running Containers. Overall, this allows applications to preserve work, in contrast to platforms that kill Containers to satisfy resource constraints. If the application is non-collaborative, the Resource Manager can, after waiting a certain amount of time, obtain the needed resources by instructing the Node Managers to forcibly terminate Containers.
Resource Manager failures remain significant events affecting cluster availability. As of this writing, the Resource Manager will restart running Application Masters as it recovers its state. If the framework supports restart—and most will for routine fault tolerance—the platform will automatically restore users’ pipelines.
In contrast to the Hadoop 1.0 Job Tracker, it is important to mention the tasks for which the Resource Manager is not responsible. While the Resource Manager will track application execution flow and task fault-tolerance, it will not provide access to the application status servlet, which is now part of the Application Master, nor will it track previously executed jobs, as that is now delegated to the Job History Service (a daemon running on a separate node). This is consistent with the view that the Resource Manager should only handle live resource scheduling, and helps YARN core components scale better than the Hadoop 1.0 Job Tracker.