The NVIDIA RAPIDS Shuffle Manager is a custom ShuffleManager for
Apache Spark that allows fast shuffle block transfers between GPUs in the
same host (over PCIe or NVLink) and over the network to remote hosts (over
RoCE or Infiniband). The RAPIDS Shuffle Manager is based on Unified
Communication X (UCX).
NVIDIA RAPIDS Shuffle Manager has been shown to accelerate
workloads where shuffle is the bottleneck when using the RAPIDS
accelerator for Apache Spark. It accomplishes this by using a GPU
shuffle cache for fast shuffle writes when shuffle blocks fit in GPU
memory, avoiding the cost of writes to host using the built-in Spark
Shuffle, a spill framework that will spill to host memory and disk on
demand, and UCX as its transport for fast network and peer-to-peer
(GPU-to-GPU) transfers.
Cloudera and NVIDIA recommend using the RAPIDS shuffle manager for
clusters with Infiniband or RoCE networking.
-
Download UCX v1.10.1 for your operating
system.
- Install the downloaded packages using your operating system
package manager.
For CentOS, if you do not have Infiniband
or RoCE networking, you only need to install the following
packages:
ucx-1.10.1-1.el7.x86_64.rpm
ucx-cuda-1.10.1-1.el7.x86_64.rpm
If
you have Infiniband or RoCE networking, install the following
packages:
ucx-1.10.1-1.el7.x86_64.rpm
ucx-cuda-1.10.1-1.el7.x86_64.rpm
ucx-rdmacm-1.10.1-1.el7.x86_64.rpm
ucx-ib-1.10.1-1.el7.x86_64.rpm
-
Configure Mellanox Infiniband or RoCE networking. For more information, see
spark-rapids documentation and
Mellanox documentation.
Note, that:
- UCX is agnostic to switch model and make.
- UCX is fully tested on Mellanox RDMA hardware, other hardware may not be
compatible. UCX supports TCP when RDMA is not possible otherwise, but it
is not recommended as the best performance for UCX is with RDMA
hardware.
-
Validate your UCX environment following the instructions provided in the NVIDIA
spark-rapids documentation.
-
Before running applications with the RAPIDS Shuffle Manager, make the following
configuration changes:
-
Disable the External Shuffle Service:
spark.shuffle.service.enabled=false
-
Disable Dynamic Allocation:
spark.dynamicAllocation.enabled=false
-
Enable the RAPIDS Shuffle Manager:
spark.shuffle.manager=com.nvidia.spark.rapids.spark311cdh.RapidsShuffleManager
-
Specify the “extraClassPath” Executor:
spark.executor.extraClassPath=/opt/cloudera/parcels/SPARK3_RAPIDS/lib/spark3/rapids-plugin/*
-
At a minimum, make the following UCX settings:
spark.executorEnv.UCX_ERROR_SIGNALS=
spark.executorEnv.UCX_MEMTYPE_CACHE=n
- Optional:
Recommended additional UCX settings:
spark.executorEnv.UCX_TLS=cuda_copy,cuda_ipc,rc,tcp
spark.executorEnv.UCX_RNDV_SCHEME=put_zcopy
spark.executorEnv.UCX_MAX_RNDV_RAILS=1
spark.executorEnv.UCX_IB_RX_QUEUE_LEN=1024
For more information
on environment variables, see the NVIDIA
spark-rapids
documentation.
-