Configure GPU scheduling and isolation

You can configure GPU scheduling and isolation on your cluster. Currently only Nvidia GPUs are supported in YARN.

  • YARN NodeManager must be installed with the Nvidia drivers.
  1. In Cloudera Manager, select the YARN service.
  2. Click the Configuration tab.
  3. Search for Enable GPU Usage.
  4. Select the NodeManager Default Group check-box.
  5. In the configuration tab, search for Enable GPU Usage and define the GPU devices that are managed by YARN using one of the following ways.
    • Use the default value, auto, for auto detection of all GPU devices. In this case all GPU devices are managed by YARN.
    • Manually define the GPU devices that are managed by YARN. For more information about how to define these GPU devices manually, see Using GPU on YARN.
  6. Search for NodeManager GPU Detection Executable and define the location of nvidia-smi. By default, this property has no value and it means that YARN checks the following paths to find nvidia-smi:
    • /usr/bun
    • /bin
    • /usr/local/nvidia/bin
  7. Click Save, and then restart all the cluster components that require a restart.

If the NodeManager fails to start, the following error is displayed:

INFO gpu.GpuDiscoverer (GpuDiscoverer.java:initialize(240)) - Trying to discover GPU information ... WARN gpu.GpuDiscoverer (GpuDiscoverer.java:initialize(247)) - Failed to discover GPU information from system, exception message:ExitCodeException exitCode=12: continue... 
Fix the error by exporting the LD_LIBRARY_PATH in the yarn -env.sh using the following command: export LD_LIBRARY_PATH=/ usr/local/nvidia/lib:/usr/local/nvidia/lib64:$LD_LIBRARY_PATH