YARN Resource Management
Also available as:
PDF
loading table of contents...

Using CPU Scheduling

MapReduce Jobs Only

If you primarily run MapReduce jobs on your cluster, you probably will not see much of a change in performance if you enable CPU scheduling. The dominant resource for MapReduce is memory, so the DRF scheduler continues to balance MapReduce jobs out similar to the default resource calculator. In the single resource case, the DRF reduces to max-min fairness for that resource.

Mixed Workloads

One example of a mixed workload would be a cluster that runs both MapReduce and Storm-on-YARN. MapReduce is not CPU-constrained (MapReduce containers do not ask for much CPU). Storm on YARN is CPU-constrained: its containers ask for more CPU than memory. As you start adding a Storm jobs along with MapReduce jobs, the DRF scheduler does its best to balance memory and CPU resources, but you might start to see some degradation in performance. If you were to then add more CPU-intensive Storm jobs, individual jobs will start to take longer to run as the cluster CPU resources become consumed.

CGroups can be used along with CPU scheduling to help manage mixed workloads. CGroups provides isolation for CPU-intensive processes such as Storm-on-YARN, thereby enabling you to predictably plan and constrain the CPU-intensive Storm containers.

You could also use node labels in conjunction with CPU scheduling and CGroups to restrict Storm-on-YARN jobs to a subset of cluster nodes.