Add the Cloudera Data Science Workbench Service

Perform the following steps to add the Cloudera Data Science Workbench service to your cluster.

Log into the Cloudera Manager Admin Console.
On the Home > Status tab, click to the right of the cluster name and select Add a Service to launch the wizard. A list of services will be displayed.
Select the Cloudera Data Science Workbench service and click Continue.
Select the services which the new CDSW service should depend on. At a minimum, the HDFS, Spark 2, and YARN services are required for the CDSW service to run successfully. Click Continue.
(Required for CDH 6) If you want to run SparkSQL workloads, you must also add the Hive service as a dependency.
Assign the CDSW roles, HDFS, Spark 2, and YARN, to gateway hosts:

Master

Assign the Master role to a gateway host that is the designated Master host. This is the host that should have the Application Block Device mounted to it.

Worker

Assign the Worker role to any other gateway hosts that will be used for Cloudera Data Science Workbench. Note that Worker hosts are not required for a fully-functional Cloudera Data Science Workbench deployment. For proof-of-concept deployments, you can deploy a 1-host cluster with just a Master host. The Master host can run user workloads just as a worker host can.

Even if you are setting up a multi-host deployment, do not assign the Master and Worker roles to the same host . By default, the Master host doubles up to perform both functions: those of the Master and those of a worker.

Docker Daemon

This role runs underlying Docker processes on all Cloudera Data Science Workbench hosts. The Docker Daemon role must be assigned to every Cloudera Data Science Workbench gateway host.

On First Run, Cloudera Manager will automatically assign this role to each Cloudera Data Science Workbench gateway host. However, if any more hosts are added or reassigned to Cloudera Data Science Workbench, you must explicitly assign the Docker Daemon role to them.

Application

This role runs the Cloudera Data Science Workbench application. This role runs only on the CDSW Master host.

On First Run, Cloudera Manager will assign the Application role to the host running the Cloudera Data Science Workbench Master role. The Application role is always assigned to the same host as the Master. Consequently, this role must never be assigned to a Worker host.

The following image shows the role assignments for a Cloudera Data Science Workbench Master host and Worker host:

Configure the following parameters and click Continue.


Properties	Description
Cloudera Data Science Workbench Domain	DNS domain configured to point to the master host. If the previously configured DNS subdomain entries are `cdsw.<your_domain>.com` and `*.cdsw.<your_domain>.com`, then this parameter should be set to `cdsw.<your_domain>.com`. Users' browsers contact the Cloudera Data Science Workbench web application at `http://cdsw.<your_domain>.com`. This domain for DNS only and is unrelated to Kerberos or LDAP domains.
Master Node IPv4 Address	IPv4 address for the master host that is reachable from the worker host. By default, this field is left blank and Cloudera Manager uses the IPv4 address of the Master host. Within an AWS VPC, set this parameter to the internal IP address of the master host; for instance, if your hostname is `ip-10-251-50-12.ec2.internal`, set this property to the corresponding IP address, `10.251.50.12`.
Install Required Packages	When this parameter is enabled, the Prepare Node command will install all the required package dependencies on First Run. If you choose to disable this property, you must manually install the following packages on all gateway hosts running Cloudera Data Science Workbench roles: nfs-utils libseccomp lvm2 bridge-utils libtool-ltdl iptables rsync policycoreutils-python selinux-policy-base selinux-policy-targeted ntp ebtables bind-utils openssl e2fsprogs redhat-lsb-core conntrack-tools bash curl note Be sure to reboot the host machines after the Prepare Node command installs all the required package dependencies.
Docker Block Device	Block device(s) for Docker images. Use the full path to specify the image(s), for instance, `/dev/xvde`. For the First Run of the Cloudera Data Science workbench service, you only need to enter the list of block devices for the Master node. To add block devices for other (worker) nodes use Role Groups in Cloudera Manager. The Cloudera Data Science Workbench installer will format and mount Docker on each gateway host that is assigned the Docker Daemon role. Do not mount these block devices prior to installation.
RESERVE_MASTER Node	Cloudera recommends setting the `RESERVE_MASTER` node option to `TRUE` for clusters with 100+ users and/or 10+ nodes.

The wizard will now begin a First Run of the Cloudera Data Science Workbench service. This includes deploying client configuration for HDFS, YARN and Spark 2, installing the package dependencies on all hosts, and formatting the Docker block device. The wizard will also assign the Application role to the host running Master and the Docker Daemon role to all the gateway hosts running Cloudera Data Science Workbench.

note
Ensure you have 200GB of space devoted to DOCKER_TMPDIR (default to /var/lib/cdsw/docker-tmp) on the master node. This is needed to unzip all of the new docker images.
Once the First Run command has completed successfully, click Finish to go back to the Cloudera Manager home page.