Configure GPU Scheduling and Isolation
On an Ambari cluster, you can configure GPU scheduling and isolation. On a non-Ambari
cluster, you must configure certain properties in the
capacity-scheduler.xml
, resource-types.xml
, and
yarn-site.xml
files. Currently only Nvidia GPUs are supported in
YARN.
- YARN NodeManager must be installed with the Nvidia drivers.
Enable GPU scheduling and isolation on an Ambari cluster
- Select YARN > CONFIGS on the Ambari dashboard.
- Click GPU Scheduling and Isolation under GPU.
- In the Absolute path of nvidia-smi on NodeManagers field, enter the absolute path to the nvidia-smi GPU discovery executable. For example, /usr/local/bin/nvidia-smi
- Click Save, and then restart all the cluster components that require a restart.
INFO gpu.GpuDiscoverer (GpuDiscoverer.java:initialize(240)) - Trying to discover GPU information ...
WARN gpu.GpuDiscoverer (GpuDiscoverer.java:initialize(247)) - Failed to discover GPU information from system,
exception message:ExitCodeException exitCode=12: continue...
Export the LD_LIBRARY_PATH in the yarn -env.sh using the following command:
export
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:$LD_LIBRARY_PATH
Enable GPU scheduling and isolation on a non-Ambari cluster
DominantResourceCalculator
must be configured first before you enable
GPU scheduling/isolation. Configure the following property in
the/etc/hadoop/conf/capacity-scheduler.xml
file
Property:
yarn.scheduler.capacity.resource-calculator
Value:
org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
- Enable GPU scheduling in the
/etc/hadoop/conf/resource-types.xml
file on the ResourceManager and NodeManager hosts:Property:
yarn.resource-types
Value:
yarn.io/gpu
Example:
<configuration> <property> <name>yarn.resource-types</name> <value>yarn.io/gpu</value> </property> </configuration>
- Enable GPU isolation in the the
/etc/hadoop/conf/yarn-site.xml
file on the NodeManager host:Property:
yarn.nodemanager.resource-plugins
Value:
yarn.io/gpu
Example:
<configuration> <property> <name>yarn.nodemanager.resource-plugins</name> <value>yarn.io/gpu</value> </property> </configuration>
- Set the following advanced properties in the
/etc/hadoop/conf/yarn-site.xml
file on the NodeManager host:-
To allow GPU devices:
Property:
yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices
Value:
auto
NoteTheauto
setting enables YARN to automatically detect and manage GPU devices. For other options, see YARN-7223. -
To allow YARN NodeManager to to locate discovery executable:
Property:
yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables
Value:<absolute_path_to_nvidia-smi_binary>
NoteSupports only nvidia-smi.Example:
/usr/local/bin/nvidia-smi
-
- Set the following property in the
/etc/hadoop/conf/yarn-site.xml
file on the NodeManager host to automatically mount cgroup sub-devices:-
Property:
yarn.nodemanager.linux-container-executor.cgroups.mount
Value:
true
-
- Set the following configuration in the
/etc/hadoop/conf/container-executor.cfg
to run GPU applications under non-Docker environment:- In the GPU section, set:
Property:
module.enabled=true
- In the cgroups section, set:
Property:
root=/sys/fs/cgroup
NoteThis should be same asyarn.nodemanager.linux-container-executor.cgroups.mount-path
in theyarn-site.xml
fileProperty:yarn-hierarchy=yarn
NoteThis should be same asyarn.nodemanager.linux-container-executor.cgroups.hierarchy
in theyarn-site.xml
file
- In the GPU section, set: