Heterogeneous GPU clusters

When using heterogeneous GPU nodes to run sessions and jobs the available GPU accelerator labels need to be selected on the Kubernetes cluster.

Selecting GPUs for a workload

The workload configuration displays a dropdown menu for the possible GPU accelerator labels available on the Kubernetes cluster.

Figure 1. Selecting GPUs for workload when starting a new session

Figure 2. Selecting GPUs for workload when starting a new job

If you are unsure about which GPU accelerator label to use, you can use the default value set by the Cloudera Machine Learning (CML) workspace administrator.

APIv2-launched workloads

APIv2-launched workloads can also target a specific GPU.

The following code example presents details on identifying the GPU accelerator label:

# Instantiate the cml api client
import cmlapi
client = cmlapi.default_client()

# List the available gpu accelerator labels 
client.list_all_accelerator_node_labels()
 
# Use the gpu accelerator label in the request body of the workload 
job_body = cmlapi.CreateJobRequest(
    project_id="some_project_id",
    name="job_with_gpu_accelerator",
    script="some_job_script.py",
    kernel="some_runtime_kernel",
    runtime_identifier="some_runtime",
    nvidia_gpu = 1,
    accelerator_label_id = 2
)

apiv2client.create_job(
   project_id="some_project_id", body=job_body)

Existing APIv2 calls which do not specify a GPU accelerator ID can be launched using the default GPU accelerator, set by the CML Administrator.

A CML workload is backed by a Kubernetes Pod, which can only reserve as many resources as there are available on a single node.