Add the Cloudera Data Science Workbench Service
Perform the following steps to add the Cloudera Data Science Workbench service to your cluster.
- Log into the Cloudera Manager Admin Console.
- On the Home > Status tab, click to the right of the cluster name and select Add a Service to launch the wizard. A list of services will be displayed.
- Select the Cloudera Data Science Workbench service and click Continue.
- Select the services which the new CDSW service should depend on. At a minimum, the HDFS, Spark
2, and YARN services are required for the CDSW service to run successfully. Click Continue.
(Required for CDH 6) If you want to run SparkSQL workloads, you must also add the Hive service as a dependency.
-
Assign the CDSW roles, HDFS, Spark 2, and YARN, to gateway hosts:
- Master
- Assign the Master role to a gateway host that is the designated Master host. This is the host that should have the Application Block Device mounted to it.
- Worker
-
Assign the Worker role to any other gateway hosts that will be used for Cloudera Data Science Workbench. Note that Worker hosts are not required for a fully-functional Cloudera Data Science Workbench deployment. For proof-of-concept deployments, you can deploy a 1-host cluster with just a Master host. The Master host can run user workloads just as a worker host can.
Even if you are setting up a multi-host deployment, do not assign the Master and Worker roles to the same host . By default, the Master host doubles up to perform both functions: those of the Master and those of a worker.
- Docker Daemon
-
This role runs underlying Docker processes on all Cloudera Data Science Workbench hosts. The Docker Daemon role must be assigned to every Cloudera Data Science Workbench gateway host.
On First Run, Cloudera Manager will automatically assign this role to each Cloudera Data Science Workbench gateway host. However, if any more hosts are added or reassigned to Cloudera Data Science Workbench, you must explicitly assign the Docker Daemon role to them.
- Application
-
This role runs the Cloudera Data Science Workbench application. This role runs only on the CDSW Master host.
On First Run, Cloudera Manager will assign the Application role to the host running the Cloudera Data Science Workbench Master role. The Application role is always assigned to the same host as the Master. Consequently, this role must never be assigned to a Worker host.
The following image shows the role assignments for a Cloudera Data Science Workbench Master host and Worker host:
- Configure the following parameters and click Continue.
Properties Description Cloudera Data Science Workbench Domain
DNS domain configured to point to the master host.
If the previously configured DNS subdomain entries are
cdsw.<your_domain>.com
and*.cdsw.<your_domain>.com
, then this parameter should be set tocdsw.<your_domain>.com
.Users' browsers contact the Cloudera Data Science Workbench web application at
http://cdsw.<your_domain>.com
.This domain for DNS only and is unrelated to Kerberos or LDAP domains.
Master Node IPv4 Address
IPv4 address for the master host that is reachable from the worker host. By default, this field is left blank and Cloudera Manager uses the IPv4 address of the Master host.
Within an AWS VPC, set this parameter to the internal IP address of the master host; for instance, if your hostname is
ip-10-251-50-12.ec2.internal
, set this property to the corresponding IP address,10.251.50.12
.Install Required Packages
When this parameter is enabled, the Prepare Node command will install all the required package dependencies on First Run. If you choose to disable this property, you must manually install the following packages on all gateway hosts running Cloudera Data Science Workbench roles:nfs-utils libseccomp lvm2 bridge-utils libtool-ltdl iptables rsync policycoreutils-python selinux-policy-base selinux-policy-targeted ntp ebtables bind-utils openssl e2fsprogs redhat-lsb-core bash curl conntrack-tools
Docker Block Device
Block device(s) for Docker images. Use the full path to specify the image(s), for instance,
/dev/xvde
.For the First Run of the Cloudera Data Science workbench service, you only need to enter the list of block devices for the Master node.
To add block devices for other (worker) nodes use Role Groups in Cloudera Manager.
The Cloudera Data Science Workbench installer will format and mount Docker on each gateway host that is assigned the Docker Daemon role. Do not mount these block devices prior to installation.
RESERVE_MASTER Node Cloudera recommends setting the RESERVE_MASTER
node option toTRUE
for clusters with 100+ users and/or 10+ nodes. - The wizard will now begin a First Run of the Cloudera Data Science
Workbench service. This includes deploying client configuration for
HDFS, YARN and Spark 2, installing the package dependencies on all
hosts, and formatting the Docker block device. The wizard will also
assign the Application role to the host running Master and the Docker
Daemon role to all the gateway hosts running Cloudera Data
Science Workbench.
- Once the First Run command has completed successfully, click Finish to go back to the Cloudera Manager home page.