Add the Cloudera Data Science Workbench Service

Perform the following steps to add the Cloudera Data Science Workbench service to your cluster.

  1. Log into the Cloudera Manager Admin Console.
  2. On the Home > Status tab, click to the right of the cluster name and select Add a Service to launch the wizard. A list of services will be displayed.
  3. Select the Cloudera Data Science Workbench service and click Continue.
  4. Select the services which the new CDSW service should depend on. At a minimum, the HDFS, Spark 2, and YARN services are required for the CDSW service to run successfully. Click Continue.
    (Required for CDH 6) If you want to run SparkSQL workloads, you must also add the Hive service as a dependency.
  5. Assign the CDSW roles, HDFS, Spark 2, and YARN, to gateway hosts:
    Master
    Assign the Master role to a gateway host that is the designated Master host. This is the host that should have the Application Block Device mounted to it.
    Worker

    Assign the Worker role to any other gateway hosts that will be used for Cloudera Data Science Workbench. Note that Worker hosts are not required for a fully-functional Cloudera Data Science Workbench deployment. For proof-of-concept deployments, you can deploy a 1-host cluster with just a Master host. The Master host can run user workloads just as a worker host can.

    Even if you are setting up a multi-host deployment, do not assign the Master and Worker roles to the same host . By default, the Master host doubles up to perform both functions: those of the Master and those of a worker.

    Docker Daemon

    This role runs underlying Docker processes on all Cloudera Data Science Workbench hosts. The Docker Daemon role must be assigned to every Cloudera Data Science Workbench gateway host.

    On First Run, Cloudera Manager will automatically assign this role to each Cloudera Data Science Workbench gateway host. However, if any more hosts are added or reassigned to Cloudera Data Science Workbench, you must explicitly assign the Docker Daemon role to them.

    Application

    This role runs the Cloudera Data Science Workbench application. This role runs only on the CDSW Master host.

    On First Run, Cloudera Manager will assign the Application role to the host running the Cloudera Data Science Workbench Master role. The Application role is always assigned to the same host as the Master. Consequently, this role must never be assigned to a Worker host.

    The following image shows the role assignments for a Cloudera Data Science Workbench Master host and Worker host:


  6. Configure the following parameters and click Continue.
    Properties Description

    Cloudera Data Science Workbench Domain

    DNS domain configured to point to the master host.

    If the previously configured DNS subdomain entries are cdsw.<your_domain>.com and *.cdsw.<your_domain>.com, then this parameter should be set to cdsw.<your_domain>.com.

    Users' browsers contact the Cloudera Data Science Workbench web application at http://cdsw.<your_domain>.com.

    This domain for DNS only and is unrelated to Kerberos or LDAP domains.

    Master Node IPv4 Address

    IPv4 address for the master host that is reachable from the worker host. By default, this field is left blank and Cloudera Manager uses the IPv4 address of the Master host.

    Within an AWS VPC, set this parameter to the internal IP address of the master host; for instance, if your hostname is ip-10-251-50-12.ec2.internal, set this property to the corresponding IP address, 10.251.50.12.

    Install Required Packages

    When this parameter is enabled, the Prepare Node command will install all the required package dependencies on First Run. If you choose to disable this property, you must manually install the following packages on all gateway hosts running Cloudera Data Science Workbench roles:
    nfs-utils
    libseccomp
    lvm2
    bridge-utils
    libtool-ltdl
    iptables   
    rsync 
    policycoreutils-python 
    selinux-policy-base 
    selinux-policy-targeted 
    ntp 
    ebtables 
    bind-utils  
    openssl 
    e2fsprogs 
    redhat-lsb-core 
    conntrack-tools
    bash
    curl

    Docker Block Device

    Block device(s) for Docker images. Use the full path to specify the image(s), for instance, /dev/xvde.

    For the First Run of the Cloudera Data Science workbench service, you only need to enter the list of block devices for the Master node.

    To add block devices for other (worker) nodes use Role Groups in Cloudera Manager.

    The Cloudera Data Science Workbench installer will format and mount Docker on each gateway host that is assigned the Docker Daemon role. Do not mount these block devices prior to installation.

    RESERVE_MASTER Node Cloudera recommends setting the RESERVE_MASTER node option to TRUE for clusters with 100+ users and/or 10+ nodes.
  7. The wizard will now begin a First Run of the Cloudera Data Science Workbench service. This includes deploying client configuration for HDFS, YARN and Spark 2, installing the package dependencies on all hosts, and formatting the Docker block device. The wizard will also assign the Application role to the host running Master and the Docker Daemon role to all the gateway hosts running Cloudera Data Science Workbench.
  8. Once the First Run command has completed successfully, click Finish to go back to the Cloudera Manager home page.