Upgrading Cloudera Data Science Workbench Using Packages

This topic describes how to upgrade an RPM-based deployment to the latest version of Cloudera Data Science Workbench.

The first restart of CDSW after an upgrade can take up to 30 to 50 minutes. This process can take longer depending on the CDSW internet connection.
Before you start upgrading Cloudera Data Science Workbench, read the Cloudera Data Science Workbench Release Notes relevant to the version you are upgrading to.
  1. Run the following command on all Cloudera Data Science Workbench hosts (master and workers) to stop Cloudera Data Science Workbench.
    cdsw stop
  2. (Upgrading from CDSW 1.7.1 with patch) Perform this step only if you are upgrading from CDSW 1.7.1 with an applied patch.
    1. Delete the 2 patch files: /etc/cdsw/patches/default/deployment/ingress-controller.yaml and /etc/cdsw/patches/default/deployment/tcp-ingress-controller.yaml.
    2. Delete every empty folder from the /etc/cdsw/patches directory.
    3. Delete the /etc/cdsw/patches directory if it is empty.
  3. (Strongly Recommended) On the master host, backup all your application data that is stored in the /var/lib/cdsw directory.
    To create the backup, run the following command on the master host:
    tar cvzf cdsw.tar.gz /var/lib/cdsw/*
  4. Save a backup of the Cloudera Data Science workbench configuration file at:
    /etc/cdsw/config/cdsw.conf
  5. Uninstall the previous release of Cloudera Data Science Workbench. Perform this step on the master host, as well as all the worker hosts.
    yum remove cloudera-data-science-workbench 
  6. Install the latest version of Cloudera Data Science Workbench on the master host and on all the worker hosts. During the installation process, you might need to resolve certain incompatibilities in cdsw.conf. Even though you will be installing the latest RPM, your previous configuration settings in cdsw.conf will remain unchanged. Depending on the release you are upgrading from, you will need to modify cdsw.conf to ensure it passes the validation checks run by the release.

    To install the latest version of Cloudera Data Science Workbench, follow the same process to install the package as you would for a fresh installation.

  7. Upgrade Projects to Use the Latest Base Engine Images
    If the release you have just upgraded to includes a new version of the base engine image, you will need to manually configure existing projects to use the new engine. Cloudera recommends you do so to take advantage of any new features and bug fixes included in the newly released engine. For example:
    • Container Security

      Security best practices dictate that engine containers should not run as the root user. Engines (v7 and lower) briefly initialize as the root user and then run as the cdsw user. Engines v8 (and higher) now follow the best practice and run only as the cdsw user. For more details, see Restricting User-Created Pods.

    • CDH 6 Compatibility

      The base engine image you use must be compatible with the version of CDH you are running. This is especially important if you are running workloads on Spark. Older base engines (v6 and lower) cannot support the latest versions of CDH 6. If you want to run Spark workloads on CDH 6, you must upgrade your projects to base engine 7 (or higher).

    • Editors

      Engines v8 (and higher) ships with the browser-based IDE, Jupyter, preconfigured and can be selected from the Start Session menu.

    To upgrade a project to the new engine, go to the project's Settings > Engine page and select the new engine from the dropdown. If any of your projects are using custom extended engines, you will need to modify them to use the new base engine image.
  8. (GPU-enabled Deployments) Remove nvidia-docker1 and Upgrade NVIDIA Drivers to 410.xx or higher
    Perform the following steps to make sure you can continue to leverage GPUs for workloads on Cloudera Data Science Workbench 1.6 (and higher).
    1. Remove nvidia-docker1. Cloudera Data Science Workbench (version 1.6 and higher) ships with nvidia-docker2 installed by default.
      yum remove nvidia-docker
      Perform this step on all hosts that have GPUs attached to them.
    2. Upgrade your NVIDIA driver to version 410.xx (or higher). This must be done because nvidia-docker2 does not support lower versions of NVIDIA drivers.
      • Stop Cloudera Data Science Workbench.

        Depending on your deployment, either stop the CDSW service in Cloudera Manager (for CDPs) or run cdsw stop on the Master host (for RPMs).

      • Reboot the GPU-enabled hosts.
      • Install a supported version of the NVIDIA driver (410.xx or higher) on all GPU-enabled hosts.
      • Start Cloudera Data Science Workbench.

        Depending on your deployment, either start the CDSW service in Cloudera Manager (for CDPs) or run cdsw start on the Master host (for RPMs).