Configure GPU scheduling and isolation
You can configure GPU scheduling and isolation on your cluster. Currently only Nvidia GPUs are supported in YARN.
- YARN NodeManager must be installed with the Nvidia drivers.
- In Cloudera Manager, select the YARN service.
- Click the Configuration tab.
- Search for Enable GPU Usage.
- Select the NodeManager Default Group check-box.
-
In the configuration tab, search for Enable GPU Usage and define the GPU
devices that are managed by YARN using one of the following ways.
- Use the default value,
auto
, for auto detection of all GPU devices. In this case all GPU devices are managed by YARN. - Manually define the GPU devices that are managed by YARN. For more information about how to define these GPU devices manually, see Using GPU on YARN.
- Use the default value,
-
Search for NodeManager GPU Detection Executable and define
the location of nvidia-smi. By default, this property has no value and it means that
YARN checks the following paths to find nvidia-smi:
- /usr/bun
- /bin
- /usr/local/nvidia/bin
- Click Save, and then restart all the cluster components that require a restart.
If the NodeManager fails to start, the following error is displayed:
INFO gpu.GpuDiscoverer (GpuDiscoverer.java:initialize(240)) - Trying to discover GPU information ... WARN gpu.GpuDiscoverer (GpuDiscoverer.java:initialize(247)) - Failed to discover GPU information from system, exception message:ExitCodeException exitCode=12: continue...
Fix the error by exporting the LD_LIBRARY_PATH
in the yarn -env.sh using
the following command: export LD_LIBRARY_PATH=/
usr/local/nvidia/lib:/usr/local/nvidia/lib64:$LD_LIBRARY_PATH