New Features and Changes in Cloudera Data Science Workbench 1.6.0

Major features and updates for Cloudera Data Science Workbench.

Bring Your Own Editor

You can now take advantage of all the benefits of Cloudera Data Science Workbench while using an editor you are familiar with. This feature supports third-party IDEs that run on your local machine like PyCharm and browser-based IDEs such as Jupyter. Base Image v8 ships with Jupyter preconfigured and can be selected from the Start Session menu.

For details, see Editors.

Multiple Cloudera Data Science Workbench Deployments
You can now have multiple Cloudera Data Science Workbench CSD deployments associated with one instance of Cloudera Manager.

For details, see Multiple Cloudera Data Science Workbench Deployments.

Audits
Cloudera Data Science Workbench logs specific events, such as user logins and sharing, that you can view by querying a database. For more information, see Monitoring User Events and Tracked User Events.
Expanded Support for Distributed Machine Learning
Cloudera Data Science Workbench 1.6 (and higher) allows you to run distributed workloads with frameworks such as TensorFlowOnSpark, H2O, XGBoost, and so on. This is similar to what you can already do with Spark workloads that run on the attached CDH/HDP cluster. For details, see Running Distributed ML Workloads on YARN.
cdswctl CLI Client
The cdswctl client provides an additional way to interact with your Cloudera Data Science Workbench deployment to perform certain actions. For example, you can use the cdswctl client to start an SSH-endpoint on your local machine and then connect a local IDE, such as PyCharm, to Cloudera Data Science Workbench.

You can download cdswctlfrom the Cloudera Data Science Workbench web UI and use it from your local machine. Note that this client differs from the cdsw CLI tool used to run commands such as cdsw status, which exists within the Cloudera Data Science Workbench deployment.

For details, see cdswctl Command Line Interface Client.

Status and Validate Commands
The CDSW service in Cloudera Manager now includes two new commands that can be used to assess the status of your Cloudera Data Science Workbench deployment: Status and Validate. They are the equivalent of the cdsw status and cdsw validate commands that are available via the CLI.

For details, see Checking the Status of the CDSW Service.

Experiments
  • If your cluster has been equipped with GPUs, you can now use GPUs to run experiments on Cloudera Data Science Workbench.
  • Tracked experiment files now refresh and appear automatically on the Overview page for a run of an experiment. Previously, you had to manually refresh the page after an experiment completes.
Command Line Interface (CLI) Changes - RPM Deployments only
  • The cdsw reset command has been removed and replaced by the cdsw stop command.
  • The cdsw init command has been removed and replaced by the cdsw start command.

For details on how these commands behave on the master and worker hosts, refer to the Cloudera Data Science Workbench Command Line Reference .

Kubernetes and Weave
Kubernetes has been upgraded to version 1.11.7. Weave Net has been upgraded to version 2.5.1. This upgrade resolves Weave issue #2934.
Loggs
  • Staging Directory

    You can now configure the temporary directory that Cloudera Data Science Workbench uses to stage logs when collecting a diagnostic bundle. Old logs in the directory are deleted when a new diagnostic bundle is collected or when the size grows larger than 10 MB.

  • Logs tab

    Running sessions now display a Logs tab. This tab displays engine logs and, if applicable, Spark logs for the running session. Previously, if you wanted to access these logs, that required logging into the Cloudera Data Science Workbench host(s) and the Spark server.

    For details, see Diagnostic Bundles.

Operating System
Cloudera Data Science Workbench 1.6 supports RHEL and CentOS 7.6.
Workload Scheduling Changes
  • Starting with version 1.6, Cloudera Data Science Workbench allows you to specify a list of CDSW gateway hosts that are labeled as Auxiliary Nodes. These hosts will be deprioritized during workload scheduling. That is, they will be chosen to run workloads that can’t be scheduled on any other hosts. For example, sessions with very large resource requests, or when the other hosts are fully utilized.

    For details, see Customize Workoad Scheduling.

  • Reserve Master Host

    Cloudera Data Science Workbench 1.4.3 introduced a new feature that allowed you to reserve the CDSW Master host for running internal application components. Starting with version 1.6, this feature can be enabled on CSD-based deployments using the Reserve Master Host property in Cloudera Manager. Safety valves are no longer needed.

    For details, see Reserving the Master Host for Internal CDSW Components.

Security
  • FreeIPA Support

    In addition to MIT Kerberos and Active Directory, Cloudera Data Science Workbench now also supports FreeIPA as an identity management system. For details, see Configure FreeIPA.

  • New User Role - Operator

    Version 1.6 includes a new access role called Operator. When a user is assigned the Operator role on a project, they will be able to start and stop pre-existing jobs and will have view-only access to project code, data, and results.

  • Restricting User-Controlled Kubernetes Pods

    Cloudera Data Science Workbench 1.6 includes three new properties that allow you to control the permissions granted to user-controlled Kubernetes pods. An example of a user-controlled pod is the engine pod, which provides the environment for sessions, jobs, etc. These pods are launched in a per-user Kubernetes namespace. Since the user has the ability to launch arbitrary pods, these settings restrict what those pods can do.

    For details, see Retricting User-Controlled Kubernetes Pods.

  • LDAP/SAML Configuration Changes

    Previously, if you wanted to grant the site administrator role to users of an LDAP/SAML group, that group had to be listed under 2 properties: LDAP/SAML Full Administrator Groups and LDAP/SAML User Groups. If a group was only listed under LDAP/SAML Full Administrator Groups, and not under LDAP/SAML User Groups, users of that group would not be able to log in to CDSW.

    With version 1.6, you do not need to list the admin groups under both properties. Users belonging to groups listed under LDAP/SAML Full Administrator Groups will be able to log in and have site administrator access to Cloudera Data Science Workbench as expected.

  • Project and Team Creation

    Site administrators can now restrict whether or not users can create projects or teams with the following properties on the Settings page:
    • Allow users to create projects
    • Allow users to create teams
    For details, see User Access to Features.
  • Session Tokens

    The method by which the Cloudera Data Science Workbench web UI session tokens are stored has been hardened. Users must log out of the Cloudera Data Science Workbench web UI and back in after upgrading to version 1.6.0.

  • Sharing

    Site administrators can now control whether consoles can be shared with the Allow console output sharing property on the Admin > Security page. Disable this property to remove the Share button from the project workspace and workbench UI as well as disable access to all shared console outputs across the deployment. Note that re-enabling this property does not automatically grant access to previously shared consoles. You will need to manually share each console again.

  • TLS/SSL

    Cloudera Data Science Workbench now defaults to using TLS 1.2. The default cipher suites have also been upgraded to Mozilla's Modern cipher suites.

IPv6 Requirement
Cloudera Data Science Workbench 1.6.x requires you to enable IPv6 on all CDSW gateway hosts. For instructions, refer the workaround provided in Known Issue: CDSW cannot start sessions due to connection errors.
Spark UI
The Spark UI is now available as a tab within running sessions that use Spark.