Managing Cloudera Data Science Workbench Hosts

This topic describes how to perform some common tasks related to managing Cloudera Data Science Workbench hosts.

Customize Workload Scheduling

Starting with version 1.6, Cloudera Data Science Workbench allows you to specify a list of CDSW gateway hosts that are labeled as Auxiliary Nodes. These hosts will be deprioritized during workload scheduling. That is, they will be chosen to run workloads that can’t be scheduled on any other hosts. For example, sessions with very large resource requests, or when the other hosts are fully utilized. This means, Cloudera Data Science Workbench will use the following order of preference when scheduling non-GPU workloads (session, job, experiment, or model):

Worker Hosts > Master Host > GPU-equipped Hosts | Labeled Auxiliary Hosts

When selecting a host to schedule an engine, Cloudera Data Science Workbench will give first preference to unlabeled Worker hosts. If Workers are unavailable or at capacity, CDSW will then leverage the Master host. And finally, any GPU-equipped hosts OR labeled auxiliary hosts will be leveraged.

Points to Note:
  • GPU-equipped Hosts - Hosts equipped with GPUs will be labeled auxiliary by default so as to reserve them for GPU-intensive workloads. They do not need to be explicitly configured to be labeled. A GPU-equipped host and a labeled auxiliary host will be given equal priority when scheduling workloads.

  • Master Host - The Master host must not be labeled an auxiliary node. If you want to reserve the Master for running internal Cloudera Data Science Workbench application components, use the Reserve Master Host property.

Labeling Auxiliary Hosts

Before you proceed, make sure you have reviewed the guidelines on customizing workload scheduling in Cloudera Data Science Workbench.

Depending on your deployment type, use one of the following sets of instructions to use this feature:

CSD Deployments

On CSD deployments, use the Auxiliary Nodes property in the CDSW service in Cloudera Manager to specify a comma-separated list of auxiliary hosts.

  1. Log into the Cloudera Manager Admin Console.
  2. Go to the CDSW service.
  3. Click the Configuration tab.
  4. Search for the following property: Auxiliary Notes.
  5. Enter the hostnames that you want to label as auxiliary.
  6. Click Save Changes.
  7. Restart the CDSW service to have this change go into effect.

RPM Deployments

On RPM, deployments, use the AUXILIARY_NODES property in cdsw.conf to specify a comma-separated list of auxiliary hosts.

Reserving the Master Host for Internal CDSW Components

Cloudera Data Science Workbench allows you to reserve the master host for running internal application components and services such as Livelog, the PostgreSQL database, and so on, while user workloads run exclusively on worker hosts.

By default, the master host runs both, user workloads as well as the application's internal services. However, depending on the size of your CDSW deployment and the number of workloads running at any given time, it's possible that user workloads might dominate resources on the master host. Enabling this feature will ensure that CDSW's application components always have access to the resources they need on the master host and are not adversely affected by user workloads.

Depending on your deployment type, use one of the following sets of instructions to enable this feature:

CSD Deployments

On CSD-based deployments, this feature can be enabled in Cloudera Manager. Note that this feature is not yet available as a configuration property in Cloudera Manager. However, you can use an Advanced Configuration Snippet (Safety Valve) to configure this as follows:

  1. Log into the Cloudera Manager Admin Console.
  2. Go to the CDSW service.
  3. Click the Configuration tab.
  4. Search for the following property: Reserve Master Host. Select the checkbox to enable it.
  5. Click Save Changes.
  6. Restart the CDSW service to have this change go into effect.

RPM Deployments

To enable this feature on RPM-based deployments, go to the /etc/cdsw/config/cdsw.conf file and set the RESERVE_MASTER property to true.

Migrating a Deployment to a New Set of Hosts

The following topics describe how to migrate a Cloudera Data Science Workbench deployment to a new set of gateway hosts.

Adding a Worker Host

Using Cloudera Manager

Perform the following steps to add a new worker host to Cloudera Data Science Workbench.
  1. Log in to the Cloudera Manager Admin Console.
  2. Add a new host to your cluster. Make sure this is a gateway host and you are not running any services on this host.
  3. Assign the HDFS, YARN, and Spark 2 gateway roles to the new host. For instructions, refer the Cloudera Manager documentation at Adding a Role Instance.
  4. Go to the Cloudera Data Science Workbench service.
  5. Click the Instances tab.
  6. Click Add Role Instances.
  7. Assign the Worker and Docker Daemon roles to the new host. Click Continue.
  8. Review your changes and click Continue. The wizard finishes by performing any actions necessary to add the new role instances. Do not start the new roles at this point. You must run the Prepare Node command as described in the next steps before the roles are started.
  9. The new host must have the following packages installed on it.
    nfs-utils
    libseccomp
    lvm2
    bridge-utils
    libtool-ltdl
    iptables   
    rsync 
    policycoreutils-python 
    selinux-policy-base 
    selinux-policy-targeted 
    ntp 
    ebtables 
    bind-utils 
    nmap-ncat  
    openssl 
    e2fsprogs 
    redhat-lsb-core 
    conntrack-tools
    socat
    You must either manually install these packages now, or, allow Cloudera Manager to install them in the next step.

    If you choose the latter, make sure that Cloudera Manager has the permission needed to install the required packages. To do so, go to the Cloudera Data Science Workbench service and click Configuration. Search for the Install Required Packages property and make sure it is enabled.

  10. Click Instances and select the new host. From the list of available actions, select the Prepare Node command to install the required packages on the new node.
  11. On the Instances page, select the new role instances and click Actions for Selected > Start.

Using Packages

On an RPM deployment, the procedure to add a worker host to an existing deployment is the same as that required when you first install Cloudera Data Science Workbench on a worker. For instructions, see Installing Cloudera Data Science Workbench on a Worker Host.

Removing a Worker Host

Using Cloudera Manager

Perform the following steps to remove a worker host from Cloudera Data Science Workbench.
  1. Log into the Cloudera Manager Admin Console.
  2. Click the Instances tab.
  3. Select the Docker Daemon and Worker roles on the host to be removed from Cloudera Data Science Workbench.
  4. Select Actions for Selected > Stop and click Stop to confirm the action. Click Close when the process is complete.
  5. On the Instances page, re-select the Docker Daemon and Worker roles that were stopped in the previous step.
  6. Select Actions for Selected > Delete and then click Delete to confirm the action.

Using Packages

To remove a worker host:
  1. On the master host, run the following command to delete the worker host:
    kubectl delete node <worker_host_domain_name>
  2. Reset the worker host.
    cdsw stop

Changing the Domain Name

Cloudera Data Science Workbench allows you to change the domain of the web console.

Using Cloudera Manager

  1. Log into the Cloudera Manager Admin Console.
  2. Go to the Cloudera Data Science Workbench service.
  3. Click the Configuration tab.
  4. Search for the Cloudera Data Science Workbench Domain property and modify the value to reflect the new domain.
  5. Click Save Changes.
  6. Restart the Cloudera Data Science Workbench service to have the changes go into effect.

Using Packages

  1. Open /etc/cdsw/config/cdsw.conf and set the DOMAIN variable to the new domain name.
    DOMAIN="cdsw.<your_new_domain>.com"
  2. Run the following commands to have the new domain name go into effect.
    cdsw stop
    cdsw start