About YARN Resource Allocation
As described in the Capacity Scheduler guide, the fundamental unit of scheduling in YARN
is the queue. The capacity
of each queue specifies the percentage of cluster resources that are available for
applications submitted to the queue. You can set up queues in a hierarchy that reflects the
database structure, resource requirements, and access restrictions required by the various
organizations, groups, and users that utilize your cluster resources. When you use the default
resource calculator (DefaultResourceCalculator
) resources are allocated
based on memory alone.
You can enable CPU scheduling is enabled by using the Dominant Resource Calculator
(DominantResourceCalculator
) rather than the default resource calculator.
The Dominant Resource Calculator is based on the Dominant Resource Fairness (DRF) model of
resource allocation.
DRF uses the concept of the dominant resource to compare multidimensional resources. The idea is that in a multiresource environment, resource allocation should be determined by the dominant share of an entity (user or queue), which is the maximum share that the entity has been allocated of any resource (memory or CPU). Essentially, the DRF seeks to maximize the minimum dominant share across all entities.
For example, if user A runs CPU-heavy tasks and user B runs memory-heavy tasks, the DRF attempts to equalize CPU share of user A with the memory share of user B. In this case, the DRF allocates more CPU and less memory to the tasks run by user A, and allocates less CPU and more memory to the tasks run by user B. In the single resource case -- where all jobs are requesting the same resources -- the DRF reduces to max-min fairness for that resource.
When the Dominant Resource Calculator, queues are still used to allocate cluster resources, but both CPU and memory are taken into consideration. In the DRF model, resource allocation takes into account the dominant resource required by a process. The Dominant Resource Calculator schedules both CPU-heavy and memory-heavy processes on the same node. CPU-heavy processes (such as Storm-on-YARN) receive more CPU and less memory. Memory-heavy processes (such as MapReduce) receive more memory and less CPU. The DRF scheduler is designed to fairly distribute memory and CPU resources among different types of processes in a mixed-workload cluster.
For more information about DRF, see Dominant Resource Fairness: Fair Allocation of Mulitple Resources.