Upgrading to the Latest Version of Cloudera Data Science Workbench 1.6.x on CDH

This topic walks you through the upgrade paths available for Cloudera Data Science Workbench 1.6.x. Depending on your existing deployment, choose from one of the upgrade paths listed in the following table.

Upgrading from CDH 5 > CDH 6 - If you are currently running Cloudera Data Science Workbench on CDH 5 and want to upgrade your cluster to CDH 6, also see Upgrading Cloudera Data Science Workbench from CDH 5 to CDH 6.
Upgrade Path Link to Instructions
CSD > CSD

Upgrading from an existing CSD-based deployment to the latest 1.6.x CSD and parcel.

Upgrading Cloudera Data Science Workbench 1.6.x Using Cloudera Manager
RPM > CSD

Migrating from an RPM-based deployment to the latest 1.6.x CSD and parcel-based deployment.

Migrating from an RPM-based Deployment to the Latest 1.6.x CSD
RPM > RPM

Upgrading an existing RPM-based deployment to the latest 1.6.x RPM.

Upgrading Cloudera Data Science Workbench 1.6.x Using Packages

Note that you cannot use Cloudera Manager for this upgrade path.

Upgrading Cloudera Data Science Workbench from CDH 5 to CDH 6

This section provides a general outline on how to go about upgrading your Cloudera Data Science Workbench cluster from CDH 5 to CDH 6. Refer the relevant linked Cloudera Manager and CDH upgrade documentation for the detailed steps required for this procedure.

Upgrading CSD Deployments from CDH 5 to CDH 6

Starting with version 1.5, Cloudera Data Science Workbench publishes two separate CSD files: one for CDH 5 and one for CDH 6. Check the CSD file name to ensure that you are using the correct CSD file for your cluster. For example:
  • CDH 6 - CLOUDERA_DATA_SCIENCE_WORKBENCH_CDH6_1.x.y.jar

  • CDH 5 - CLOUDERA_DATA_SCIENCE_WORKBENCH_CDH5_1.x.y.jar

Use the following path to upgrade from running a CSD-based Cloudera Data Science Workbench deployment on CDH 5 to running on CDH 6:

  1. Upgrade to Cloudera Manager 6.1 (or higher).

  2. Stop Cloudera Data Science Workbench.

  3. Remove both Spark gateway roles from all CDSW hosts.
  4. Delete the /etc/spark and /etc/spark2 directories.
  5. Download both of the CDSW Cloudera Data Science Workbench CSD files for the latest version. For example:

    • CDSW1.8-CDH6..jar
    • CDSW1.8-CDH5..jar

    At this point, you should have three CSV files. One original file (for example, CDSW1.5-CDH5..jar) and two new files (for example, CDSW1.8-CDH6..jar and CDSW1.8-CDH5..jar).

  6. Log on to the Cloudera Manager Server host, and place the new CDSW files under /opt/cloudera/csd, which is the default location for CSDs.
  7. Restart the Cloudera Manager Server.

  8. Upgrade to Cloudera Data Science Workbench 1.5 (or higher). During the upgrade process, as you install, distribute, and activate the new parcel, take care to ensure that both the CDSW CSDs (for CDH 5 and CDH 6) are present on the Cloudera Manager Server host.
  9. Use the Cloudera Manager Upgrade Wizard to upgrade from CDH 5 to CDH 6.1 (or higher). As part of the upgrade, the wizard will also remove the Spark 2 parcel from all your cluster hosts. With CDH 6, Spark 2 ships as a part of CDH. The add-on parcel is no longer required.

    Cloudera Manager 6 can differentiate between the two active CSDs and will select the right one based on the version of CDH running. Because you already have the CDH 6-compatible CSD installed, no further steps are needed.

  10. (Optional) Remove any existing CDH 5 CSDs from the Cloudera Manager Server host.
  11. Add the Spark gateway back in.
  12. Redeploy the client configurations.
  13. Restart Cloudera Data Science Workbench.

Upgrading RPM Deployments from CDH 5 to CDH 6

Cloudera Data Science Workbench ships a single RPM that can be used to install CDSW on both, CDH 5, and CDH 6 clusters. The upgrade path for RPM deployments is:

  1. Upgrade to Cloudera Manager 6.1 (or higher).

  2. Use the Cloudera Manager Upgrade Wizard to upgrade from CDH 5 to CDH 6 (or higher). As part of the upgrade, the wizard will also remove the Spark 2 parcel from all your cluster hosts. This is because with CDH 6, Spark 2 ships as a part of CDH. The add-on parcel is no longer required.

  3. Upgrade to the latest Cloudera Data Science Workbench RPM.