Configure GPU scheduling and isolation
You can configure GPU scheduling and isolation on your cluster. Currently only Nvidia GPUs are supported in YARN.
- YARN NodeManager must be installed with the Nvidia drivers.
If the NodeManager fails to start, the following error is displayed:
INFO gpu.GpuDiscoverer (GpuDiscoverer.java:initialize(240)) - Trying to discover GPU information ... WARN gpu.GpuDiscoverer (GpuDiscoverer.java:initialize(247)) - Failed to discover GPU information from system, exception message:ExitCodeException exitCode=12: continue...Fix the error by exporting the
LD_LIBRARY_PATH
in the yarn -env.sh using
the following command: export LD_LIBRARY_PATH=/
usr/local/nvidia/lib:/usr/local/nvidia/lib64:$LD_LIBRARY_PATH