(Optional) Install Cloudera Data Science Workbench on Worker Hosts
Cloudera Data Science Workbench supports adding and removing additional worker hosts at any time. Worker hosts allow you to transparently scale the number of concurrent workloads users can run.
Use the following steps to add worker hosts to Cloudera Data Science Workbench. Note that airgapped clusters and non-airgapped clusters use different files for installation.
Non-airgapped Installation - Download the Cloudera Data Science Workbench
repo file (cloudera-cdsw.repo) from the following location:
https://archive.cloudera.com/p/cdsw1/1.8.0/redhat7/yum/cloudera-cdsw.repoAirgapped installation - For airgapped installations, download the Cloudera Data Science Workbench RPM file from the following location:
Skip this step for airgapped installations. Add the Cloudera Public GPG repository
key. This key verifies that you are downloading genuine packages.
sudo rpm --import https://archive.cloudera.com/p/cdsw1/1.8.0/redhat7/yum/RPM-GPG-KEY-cloudera
Non-airgapped Installation - Install the latest RPM with the following
sudo yum install cloudera-data-science-workbenchAirgapped Installation - Copy the RPM downloaded in the previous step to the appropriate gateway host. Then, use the complete filename to install the package. For example:
sudo yum install cloudera-data-science-workbench-22.214.171.12445.rpmFor guidance on any warnings displayed during the installation process, see Understanding Installation Warnings.
cdsw.conffile from the master host:
scp root@<cdsw-master-hostname.your_domain.com>:/etc/cdsw/config/cdsw.conf /etc/cdsw/config/cdsw.confAfter initialization, the
cdsw.conffile includes a generated bootstrap token that allows worker hosts to securely join the cluster. You can get this token by copying the configuration file from master and ensuring it has 600 permissions.If your hosts have heterogeneous block device configurations, modify the Docker block device settings in the worker host configuration file after you copy it. Worker hosts do not need application block devices, which store the project files and database state, and this configuration option is ignored.
/var/lib/cdswon the worker host. This directory must exist on all worker hosts. Without it, the next step that registers the worker host with the master will fail.Unlike the master host, the
/var/lib/cdswdirectory on worker hosts does not need to be mounted to an Application Block Device. It is only used to store client configuration for HDP services on workers.
On the worker host, run the following command to add the host to the cluster:
cdsw joinThis causes the worker hosts to register themselves with the Cloudera Data Science Workbench master host and increase the available pool of resources for workloads.
Return to the master host and verify the host is registered with this