Upgrading CDSW 1.10.x Using Cloudera Manager

This topic describes how to upgrade a CSD and parcel-based deployment to the latest version of Cloudera Data Science Workbench 1.10.4.

For installation and upgrades, you must manually add the Remote Parcel Repository URL for your CDSW version to Cloudera Manager.
  1. Before you begin the upgrade process, make sure you read the Cloudera Data Science Workbench Release Notes relevant to the version you are upgrading to/from.
  2. Stop the Cloudera Data Science Workbench service in Cloudera Manager.
  3. If the patches folder exists, remove all of patches and the /patches/ folder.
    You can reapply patches if they are needed for the current release.
  4. (Strongly Recommended) On the master host, backup all your application data that is stored in the /var/lib/cdsw directory.
    To create the backup, run the following command on the master host:
    tar cvzf cdsw.tar.gz /var/lib/cdsw/*
  5. When upgrading from CDSW 1.9.x to 1.10.x, ensure that all worker nodes have enough storage space for Docker, especially for the root volume. See Recommended Hardware Configuration.
  6. Deactivate the existing Cloudera Data Science Workbench parcel. Go to the Cloudera Manager Admin Console. In the top navigation bar, click Hosts > Parcels.
    Locate the current active CDSW parcel and click Deactivate. On the confirmation pop-up, select Deactivate Only and click OK.
  7. Download and save the latest Cloudera Data Science Workbench CSD to the Cloudera Manager Server host.
    1. Download the Cloudera Data Science Workbench CSD. Make sure you download the CSD that corresponds to the version of CDH or Cloudera Runtime you are using.
      • CDP Data Center
        https://archive.cloudera.com/p/cdsw1/1.10.4/csd/CLOUDERA_DATA_SCIENCE_WORKBENCH-CDPDC-1.10.4.jar

        OR

      • CDH 6
        https://archive.cloudera.com/p/cdsw1/1.10.4/csd/CLOUDERA_DATA_SCIENCE_WORKBENCH-CDH6-1.10.4.jar
    2. Log on to the Cloudera Manager Server host, and place the CSD file under /opt/cloudera/csd, which is the default location for CSD files.
    3. Delete any CSD files belonging to older versions of Cloudera Data Science Workbench from /opt/cloudera/csd.
      This is required because older versions of the CSD will not work with the latest Cloudera Data Science Workbench parcel. Make sure your CSD and parcel are always the same version.
      After you delete the file(s) belonging to the older version, you should have one file for the current version; CDH6 (for example, CDSW1.10-CDH6..jar).
      Note: If you have previously configured a custom location for CSD files, place the CSD file there, and delete any CSDs belonging to older versions of Cloudera Data Science Workbench. For help, refer the Cloudera Manager documentation at Configuring the Location of Custom Service Descriptor Files.
    4. Set the CSD file ownership to cloudera-scm:cloudera-scm with permission 644.
    5. Restart the Cloudera Manager Server:
      service cloudera-scm-server restart
    6. Log into the Cloudera Manager Admin Console and restart the Cloudera Management Service.
      1. Select Clusters > Cloudera Management Service.
      2. Select Actions > Restart.
  8. Distribute and activate the new parcel on your cluster.
    1. Log into the Cloudera Manager Admin Console.
    2. Click Hosts > Parcels in the main navigation bar.
    3. Add the Cloudera Data Science Workbench parcel repository URL to Cloudera Manager.
      1. On the Parcels page, click Configuration.
      2. In the Remote Parcel Repository URLs list, click the addition symbol to create a new row.
      3. Enter the path to the repository.
        Cloudera Data Science Workbench 1.10.x
        https://archive.cloudera.com/p/cdsw1/1.10.4/parcels/
      4. Click Save Changes.
    4. Go back to the Hosts > Parcels page. The latest parcel should now appear in the set of parcels available for download. Click Download. Once the download is complete, click Distribute to distribute the parcel to all the CDH hosts in your cluster. Then click Activate. For more detailed information on each of these tasks, see Managing Parcels.
  9. Run the Prepare Node command on all Cloudera Data Science Workbench hosts.
    1. Before you run Prepare Node, you must make sure that the command is allowed to install all the required packages on your cluster. This is controlled by the Install Required Packages property.
      1. Navigate to the CDSW service.
      2. Click Configuration.
      3. Search for the Install Required Packages property. If this property is enabled, you can move on to the next step and run Prepare Node.
        However, if the property has been disabled, you must either enable it or manually install the following packages on all Cloudera Data Science Workbench gateway hosts.
        nfs-utils
        libseccomp
        lvm2
        bridge-utils
        libtool-ltdl
        iptables
        rsync
        policycoreutils-python
        selinux-policy-base
        selinux-policy-targeted
        ntp
        ebtables
        bind-utils
        openssl
        e2fsprogs
        redhat-lsb-core
        conntrack-tools
        bash
        curl
    2. Ensure you have 200GB of space devoted to DOCKER_TMPDIR (default to /var/lib/cdsw/docker-tmp) on the master node. This is needed to unzip all of the new docker images.
    3. Run the Prepare Node command.
      1. In Cloudera Manager, navigate to the Cloudera Data Science Workbench service.
      2. Click the Instances tab.
      3. Use the checkboxes to select all host instances and click Actions for Selected (x).
      4. Click Prepare Node. Once again, click Prepare Node to confirm the action.
  10. Log into the Cloudera Manager Admin Console and restart the Cloudera Data Science Workbench service.
    1. On the Home > Status tab, click to the right of the CDSW service and select Restart from the dropdown.
    2. Confirm your choice on the next screen. Note that a complete restart of the service will take time. Even though the CDSW service status shows Good Health, the application itself will take some more time to get ready.
  11. Upgrade Projects to Use the Latest Base Engine Images
    If the release you have just upgraded to includes a new version of the base engine image, you will need to manually configure existing projects to use the new engine. Cloudera recommends you do so to take advantage of any new features and bug fixes included in the newly released engine. For example:
    • Container Security

      Security best practices dictate that engine containers should not run as the root user. Engines (v7 and lower) briefly initialize as the root user and then run as the cdsw user. Engines v8 (and higher) now follow the best practice and run only as the cdsw user. For more details, see Allow containers to run as root.

    • CDH 6 Compatibility

      The base engine image you use must be compatible with the version of CDH you are running. This is especially important if you are running workloads on Spark. Older base engines (v6 and lower) cannot support the latest versions of CDH 6. If you want to run Spark workloads on CDH 6, you must upgrade your projects to base engine 7 (or higher).

    • Editors

      Engines v8 (and higher) ships with the browser-based IDE, Jupyter, preconfigured and can be selected from the Start Session menu.

    To upgrade a project to the new engine, go to the project's Settings > Engine page and select the new engine from the dropdown. If any of your projects are using custom extended engines, you will need to modify them to use the new base engine image.
  12. (GPU-enabled Deployments) Remove nvidia-docker1 and Upgrade NVIDIA Drivers to 410.xx or higher
    Perform the following steps to make sure you can continue to leverage GPUs for workloads on Cloudera Data Science Workbench 1.6 (and higher).
    1. Remove nvidia-docker1. Cloudera Data Science Workbench (version 1.6 and higher) ships with nvidia-docker2 installed by default.
      yum remove nvidia-docker
      Perform this step on all hosts that have GPUs attached to them.
    2. Upgrade your NVIDIA driver to version 410.xx (or higher). This must be done because nvidia-docker2 does not support lower versions of NVIDIA drivers.
      1. Stop Cloudera Data Science Workbench.

        Depending on your deployment, either stop the CDSW service in Cloudera Manager (for CSDs) or run cdsw stop on the Master host (for RPMs).

      2. Reboot the GPU-enabled hosts.
      3. Install a supported version of the NVIDIA driver (410.xx or higher) on all GPU-enabled hosts.
      4. Start Cloudera Data Science Workbench.

        Depending on your deployment, either start the CDSW service in Cloudera Manager (for CSDs) or run cdsw start on the Master host (for RPMs).