Built-in support for UCX
The NVIDIA RAPIDS Shuffle Manager is a custom ShuffleManager for Apache Spark that allows fast shuffle block transfers between GPUs in the same host (over PCIe or NVLink) and over the network to remote hosts (over RoCE or Infiniband). The RAPIDS Shuffle Manager is based on Unified Communication X (UCX).
NVIDIA RAPIDS Shuffle Manager has been shown to accelerate workloads where shuffle is the bottleneck when using the RAPIDS accelerator for Apache Spark. It accomplishes this by using a GPU shuffle cache for fast shuffle writes when shuffle blocks fit in GPU memory, avoiding the cost of writes to host using the built-in Spark Shuffle, a spill framework that will spill to host memory and disk on demand, and UCX as its transport for fast network and peer-to-peer (GPU-to-GPU) transfers.
CDS 3.2 for GPUs has built in support for UCX, no separate installation is required.
Cloudera and NVIDIA recommend using the RAPIDS shuffle manager for clusters with Infiniband or RoCE networking.
- Validate your UCX environment following the instructions provided in the NVIDIA spark-rapids documentation.
Before running applications with the RAPIDS Shuffle Manager, make the following
Disable the External Shuffle Service:
Disable Dynamic Allocation:
Enable the RAPIDS Shuffle Manager:
Specify the “extraClassPath” Executor:
At a minimum, make the following UCX settings:
Recommended additional UCX settings:
For more information on environment variables, see the NVIDIA spark-rapids documentation.
spark.executorEnv.UCX_TLS=cuda_copy,cuda_ipc,rc,tcp spark.executorEnv.UCX_RNDV_SCHEME=put_zcopy spark.executorEnv.UCX_MAX_RNDV_RAILS=1 spark.executorEnv.UCX_IB_RX_QUEUE_LEN=1024
- Disable the External Shuffle Service: