(Optional) Install Cloudera Data Science Workbench on Worker Hosts
Cloudera Data Science Workbench supports adding and removing additional worker hosts
at any time. Worker hosts allow you to transparently scale the number of concurrent workloads
users can run.
Worker hosts are not required for a fully-functional
Cloudera Data Science Workbench deployment. For proof-of-concept deployments, you can
deploy a 1-host cluster with just a Master host. The Master host can run user workloads
just as a worker host can.
Use the following steps to add worker hosts to Cloudera Data
Science Workbench. Note that airgapped clusters and non-airgapped clusters use different
files for installation.
Non-airgapped Installation - Download the Cloudera Data Science Workbench
repo file (cloudera-cdsw.repo) from the following location:
Non-airgapped Installation - Install the latest RPM with the following
command:
sudo yum install cloudera-data-science-workbench
Airgapped Installation - Copy the RPM downloaded in the previous step to
the appropriate gateway host. Then, use the complete filename to install the package.
For example:
After initialization, the cdsw.conf file includes a generated
bootstrap token that allows worker hosts to securely join the cluster. You can get
this token by copying the configuration file from master and ensuring it has 600
permissions.
If your hosts have heterogeneous block device configurations, modify the Docker
block device settings in the worker host configuration file after you copy it. Worker
hosts do not need application block devices, which store the project files and
database state, and this configuration option is ignored.
Create /var/lib/cdsw on the worker host. This directory must exist
on all worker hosts. Without it, the next step that registers the worker host with
the master will fail.
Unlike the master host, the /var/lib/cdsw directory on worker
hosts does not need to be mounted to an Application Block Device. It is only used to
store client configuration for HDP services on workers.
On the worker host, run the following command to add the host to the cluster:
cdsw join
This causes the worker hosts to register themselves with the Cloudera Data Science
Workbench master host and increase the available pool of resources for workloads.
Return to the master host and verify the host is registered with this
command: