Managing Cloudera Data Science Workbench Hosts
This topic describes how to perform some common tasks related to managing Cloudera Data Science Workbench hosts.
Customize Workload Scheduling
Starting with version 1.6, Cloudera Data Science Workbench allows you to specify a list of CDSW gateway hosts that are labeled as Auxiliary Nodes. These hosts will be deprioritized during workload scheduling. That is, they will be chosen to run workloads that can’t be scheduled on any other hosts. For example, sessions with very large resource requests, or when the other hosts are fully utilized. This means, Cloudera Data Science Workbench will use the following order of preference when scheduling non-GPU workloads (session, job, experiment, or model):
Worker Hosts > Master Host > GPU-equipped Hosts | Labeled Auxiliary Hosts
When selecting a host to schedule an engine, Cloudera Data Science Workbench will give first preference to unlabeled Worker hosts. If Workers are unavailable or at capacity, CDSW will then leverage the Master host. And finally, any GPU-equipped hosts OR labeled auxiliary hosts will be leveraged.
-
GPU-equipped Hosts - Hosts equipped with GPUs will be labeled auxiliary by default so as to reserve them for GPU-intensive workloads. They do not need to be explicitly configured to be labeled. A GPU-equipped host and a labeled auxiliary host will be given equal priority when scheduling workloads.
- Master Host - The Master host must not be labeled an auxiliary node. If you want to reserve the Master for running internal Cloudera Data Science Workbench application components, use the Reserve Master Host property.
Labeling Auxiliary Hosts
Before you proceed, make sure you have reviewed the guidelines on customizing workload scheduling in Cloudera Data Science Workbench.
Depending on your deployment type, use one of the following sets of instructions to use this feature:
CSD Deployments
On CSD deployments, use the Auxiliary Nodes property in the CDSW service in Cloudera Manager to specify a comma-separated list of auxiliary hosts.
- Log into the Cloudera Manager Admin Console.
- Go to the CDSW service.
- Click the Configuration tab.
- Search for the following property: Auxiliary Notes.
- Enter the hostnames that you want to label as auxiliary.
- Click Save Changes.
- Restart the CDSW service to have this change go into effect.
Reserving the Master Host for Internal CDSW Components
Cloudera Data Science Workbench allows you to reserve the master host for running internal application components and services such as Livelog, the PostgreSQL database, and so on, while user workloads run exclusively on worker hosts.
By default, the master host runs both, user workloads as well as the application's internal services. However, depending on the size of your CDSW deployment and the number of workloads running at any given time, it's possible that user workloads might dominate resources on the master host. Enabling this feature will ensure that CDSW's application components always have access to the resources they need on the master host and are not adversely affected by user workloads.
Depending on your deployment type, use one of the following sets of instructions to enable this feature:
CSD Deployments
On CSD-based deployments, this feature can be enabled in Cloudera Manager. Note that this feature is not yet available as a configuration property in Cloudera Manager. However, you can use an Advanced Configuration Snippet (Safety Valve) to configure this as follows:
- Log into the Cloudera Manager Admin Console.
- Go to the CDSW service.
- Click the Configuration tab.
- Search for the following property: Reserve Master Host. Select the checkbox to enable it.
- Click Save Changes.
- Restart the CDSW service to have this change go into effect.
Migrating a Deployment to a New Set of Hosts
Adding a Worker Host
Using Cloudera Manager
- Log in to the Cloudera Manager Admin Console.
- Add a new host to your cluster. Make sure this is a gateway host and you are not running any services on this host.
- Assign the HDFS, YARN, and Spark 2 gateway roles to the new host. For instructions, refer the Cloudera Manager documentation at Adding a Role Instance.
- Go to the Cloudera Data Science Workbench service.
- Click the Instances tab.
- Click Add Role Instances.
- Assign the Worker and Docker Daemon roles to the new host. Click Continue.
- Review your changes and click Continue. The wizard finishes by performing any actions necessary to add the new role instances. Do not start the new roles at this point. You must run the Prepare Node command as described in the next steps before the roles are started.
- The new host must have the following packages installed on it.
nfs-utils libseccomp lvm2 bridge-utils libtool-ltdl iptables rsync policycoreutils-python selinux-policy-base selinux-policy-targeted ntp ebtables bind-utils nmap-ncat openssl e2fsprogs redhat-lsb-core conntrack-tools socat
You must either manually install these packages now, or, allow Cloudera Manager to install them in the next step.If you choose the latter, make sure that Cloudera Manager has the permission needed to install the required packages. To do so, go to the Cloudera Data Science Workbench service and click Configuration. Search for the Install Required Packages property and make sure it is enabled.
- Click Instances and select the new host. From the list of available actions, select the Prepare Node command to install the required packages on the new node.
- On the Instances page, select the new role instances and click .
Using Packages
On an RPM deployment, the procedure to add a worker host to an existing deployment is the same as that required when you first install Cloudera Data Science Workbench on a worker. For instructions, see Installing Cloudera Data Science Workbench on a Worker Host.
Removing a Worker Host
Using Cloudera Manager
- Log into the Cloudera Manager Admin Console.
- Click the Instances tab.
- Select the Docker Daemon and Worker roles on the host to be removed from Cloudera Data Science Workbench.
- Select Stop to confirm the action. Click Close when the process is complete. and click
- On the Instances page, re-select the Docker Daemon and Worker roles that were stopped in the previous step.
- Select Delete to confirm the action. and then click
Changing the Domain Name
Cloudera Data Science Workbench allows you to change the domain of the web console.
Using Cloudera Manager
- Log into the Cloudera Manager Admin Console.
- Go to the Cloudera Data Science Workbench service.
- Click the Configuration tab.
- Search for the Cloudera Data Science Workbench Domain property and modify the value to reflect the new domain.
- Click Save Changes.
- Restart the Cloudera Data Science Workbench service to have the changes go into effect.