1. Enabling CPU Scheduling

Enable CPU Scheduling in capacity-scheduler.xml

CPU scheduling is not enabled by default. To enable the CPU Scheduling, set the following property in the /etc/hadoop/conf/capacity-scheduler.xml file on the ResourceManager and NodeManager hosts:

Replace the DefaultResourceCalculator with the DominantResourceCalculator.

Property:yarn.scheduler.capacity.resource-calculator

Value:org.apache.hadoop.yarn.util.resource.DominantResourceCalculator

<property>
 <name>yarn.scheduler.capacity.resource-calculator</name>
 <!-- <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value> -->
 <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>

Set Vcores in yarn-site.xml

In YARN, vcores (virtual cores) are used to normalize CPU resources across the cluster. The yarn.nodemanager.resource.cpu-vcores value sets the number of CPU cores that can be allocated for containers.

The number of vcores should be set to match the number of physical CPU cores on the NodeManager hosts. Set the following property in the /etc/hadoop/conf/yarn-site.xml file on the ResourceManager and NodeManager hosts:

Property: yarn.nodemanager.resource.cpu-vcores

Value: <number_of_physical_cores>

Example:

<property>
 <name>yarn.nodemanager.resource.cpu-vcores</name>
<value>16</value>
</property>

It is also recommended that you enable CGroups along with CPU scheduling. CGroups are used as the isolation mechanism for CPU processes. With CGroups strict enforcement turned on, each CPU process gets only the resources it asks for. Without CGroups turned on, the DRF scheduler attempts to balance the load, but unpredictable behavior may occur.

Currently there is no isolation mechanism (CGroups equivalent) for Windows, so do not enable CPU scheduling on Windows.


loading table of contents...