YARN resource allocation
You can manage your cluster capacity using the Capacity Scheduler in YARN. You can use
the Capacity Scheduler's DefaultResourceCalculator
or the
DominantResourceCalculator
to allocate available resources.
The fundamental unit of scheduling in YARN is the queue. The capacity of each queue specifies the percentage of cluster resources available for applications submitted to the queue. You can set up queues in a hierarchy that reflects the database structure, resource requirements, and access restrictions required by the organizations, groups, and individuals who use the cluster resources.
You can use the default resource calculator when you want the resource calculator to
consider only the available memory for resource calculation. When you use the default
resource calculator (DefaultResourceCalculator
), resources are allocated
based on the available memory.
If you have multiple resource types, use the dominant resource calculator
(DominantResourceCalculator
) to enable CPU and memory
scheduling. The dominant resource calculator is based on the Dominant Resource Fairness
(DRF) model of resource allocation. DRF is designed to fairly distribute CPU,
and memory resources among different types of processes in a mixed-workload
cluster.
For example, if User A runs CPU-heavy tasks and User B runs memory-heavy tasks, the DRF allocates more CPU and less memory to the tasks run by User A, and allocates less CPU and more memory to the tasks run by User B. In a single resource case, in which all jobs are requesting the same resources, the DRF reduces to max-min fairness for that resource.