New Features and Changes in Cloudera Data Science Workbench 1.5.0

Major features and updates for Cloudera Data Science Workbench.

Cloudera Enterprise 6.1 Support
Cloudera Data Science Workbench is now supported with Cloudera Manager 6.1.x (and higher) and CDH 6.1.x (and higher). For details, see Cloudera Manager and CDH Requirements.
Cloudera Data Science Workbench on Hortonworks Data Platform (HDP)
Cloudera Data Science Workbench can now be deployed on HDP 2.6.5 and HDP 3.1.0. For an architecture overview and installation instructions, see Deploying Cloudera Data Science Workbench 1.8.0 on Hortonworks Data Platform.
Security Enhancements
  • Allow Site Administrators to Enable/Disable Project Uploads and Downloads - By default, all Cloudera Data Science Workbench users are allowed to upload and download files to/from a project. Version 1.5 introduces a new feature flag that allows site administrators to hide the UI features that let users upload and download project files.

    Note that this feature flag only removes the relevant features from the Cloudera Data Science Workbench UI. It does not disable the ability to upload and download files through the backend web API.

    For details on how to enable this feature, see Disabling Project File Uploads and Downloads.

OpenJDK Support
Cloudera Data Science Workbench now supports Open JDK 8 on Cloudera Enterprise 5.16.1 (and higher). For details, see Product Compatibility Matrix - Supported JDK Versions.
Engines
  • Base engine upgraded with a new version of R - 3.5.1 (Base Image v7)
  • Debugging Improvements - Previously, engines and their associated logs were deleted immediately after an exit or a crash. With version 1.5, engines are now available for about 5 minutes after they have ended to allow you to collect the relevant logs.

    Additionally, when an engine exits with a non-zero status code, the last 50 lines from the engine's logs are now printed to the Workbench console. Note that a non-zero exit code and the presence of engine logs in the Workbench does not always imply a problem with the code. Events such as session timeouts and out-of-memory issues are also assigned non-zero exit codes and will display engine logs.

Installation and Upgrade
  • New Configuration Parameters - Version 1.5 includes three new configuration parameters that can be used to specify the type of distribution you are running, the directory for the installed packages/parcels, and the path where Anaconda is installed (for HDP only).
    • DISTRO
    • DISTRO_DIR
    • ANACONDA_DIR
    Details and sample values for these properties have been added to the relevant installation topics for CDH and HDP.
  • DOCKER_TMPDIR changed to /var/lib/cdsw/tmp/docker - Previously the Cloudera Data Science Workbench installer would temporarily decompress the base engine image file to the /var/lib/docker/tmp directory. Starting with version 1.5, the installer will use the /var/lib/cdsw/tmp/docker directory instead. Make sure you have an Application block device mounted to /var/lib/cdsw as recommended so that installation/upgrade can proceed without issues.
  • Improved Validation Checks - Improved the validation checks run by the installer and the error messages that are displayed during the installation process. Cloudera Data Science Workbench now:
    • Checks that space is available on the root directory, the Application Block Device and the Docker Block Device(s).
    • Checks that DNS forward and reverse lookup works for the Cloudera Data Science Workbench Domain and Master IP address provided.
    • Displays better error messages for the cdsw status and cdsw validate commands for easier debugging.
Command Line
  • cdsw logs - Previously, the cdsw logs command generated two log bundles - one in plaintext and one with sensitive information redacted. With version 1.5, the command now generates only a single bundle that has all the sensitive information redacted by default.

    To turn off redaction of log files for internal use, you can use the new --skip-redaction option as follows:
    cdsw logs --skip-redaction
Networking
  • Cloudera Data Science Workbench now uses DNS hostnames (not IP addresses) for internal communication between components. As a result, the wildcard DNS hostname configured for Cloudera Data Science Workbench must now be resolvable from both, the CDSW cluster, and your browser.

  • Cloudera Data Science Workbench now enables IPv4 forwarding (net.ipv4.conf.default.forwarding) during the installation process.