Customize Workload Scheduling

Starting with version 1.6, Cloudera Data Science Workbench allows you to specify a list of CDSW gateway hosts that are labeled as Auxiliary Nodes. These hosts will be deprioritized during workload scheduling. That is, they will be chosen to run workloads that can’t be scheduled on any other hosts; for example sessions with very large resource requests, or when the other hosts are fully utilized..

Cloudera Data Science Workbench will use the following order of preference when scheduling non-GPU workloads (session, job, experiment, or model):

Worker Hosts > Master Host > GPU-equipped Hosts | Labeled Auxiliary Hosts

When selecting a host to schedule an engine, Cloudera Data Science Workbench will give first preference to unlabeled Worker hosts. If Workers are unavailable or at capacity, CDSW will then leverage the Master host. And finally, any GPU-equipped hosts OR labeled auxiliary hosts will be leveraged.

Points to Note:
  • GPU-equipped Hosts - Hosts equipped with GPUs will be labeled auxiliary by default so as to reserve them for GPU-intensive workloads. They do not need to be explicitly configured to be labeled. A GPU-equipped host and a labeled auxiliary host will be given equal priority when scheduling workloads.

  • Master Host - The Master host must not be labeled an auxiliary node. If you want to reserve the Master for running internal Cloudera Data Science Workbench application components, use the Reserve Master Host property.