Migrating from an RPM-based Deployment to the Latest 1.9.0 CSD
This topic describes how to migrate from an RPM-based deployment to the latest 1.9.0 CSD and parcel-based deployment.
Save a backup of the Cloudera Data Science Workbench configuration file located at
- Stop the Cloudera Data Science Workbench service in Cloudera Manager.
Stop the Cloudera Data Science Workbench service in Cloudera Manager.
Delete the 2 patch files:
/etc/cdsw/patches/default/deployment/ingress-controller.yaml and /etc/cdsw/patches/default/deployment/tcp-ingress-controller.yaml.
Delete every empty folder from the
/etc/cdsw/patchesdirectory if it is empty.
- Delete the 2 patch files:
(Strongly Recommended) On the master host, backup all your application data that
is stored in the
/var/lib/cdswdirectory.To create the backup, run the following command on the master host:
tar cvzf cdsw.tar.gz /var/lib/cdsw/*
Save a backup of the Cloudera Data Science workbench configuration file at:
Uninstall the previous release of Cloudera Data Science Workbench. Perform this step on
the master host, as well as all the worker hosts.
yum remove cloudera-data-science-workbench
Install the latest version of Cloudera Data Science Workbench using the CSD and parcel.
Note that when you are configuring role assignments for the Cloudera Data Science
Workbench service, the Master role must be assigned to the same host that was
running as master prior to the upgrade.
For installation instructions, see Installing Cloudera Data Science Workbench 1.9.0 Using Cloudera Manager. You might be able to skip the first few steps assuming you have the wildcard DNS domain and block devices already set up.
Use your copy of the backup
cdsw.confcreated in Step 3 to recreate those settings in Cloudera Manager by configuring the corresponding properties under the Cloudera Data Science Workbench service.
- Log into the Cloudera Manager Admin Console.
- Go to the Cloudera Data Science Workbench service.
- Click the Configuration tab.
The following table lists all the
cdsw.confproperties and their corresponding Cloudera Manager properties (in bold). Use the search box to bring up the properties you want to modify.
Click Save Changes.
cdsw.conf Property Corresponding Cloudera Manager Property and Description
Enable TLS: Enable and enforce HTTPS (TLS/SSL) access to the web application (optional). Both internal and external termination are supported. To enable internal termination, you must also set the TLS Certificate for Internal Termination and TLS Key for Internal Termination parameters. If these parameters are not set, terminate TLS using an external proxy.
For more details on TLS termination, see Enabling TLS/SSL for Cloudera Data Science Workbench.
TLS Certificate for Internal Termination, TLS Key for Internal Termination
Complete path to the certificate and private key (in PEM format) to be used for internal TLS termination. Set these parameters only if you are not terminating TLS externally. You must also set the Enable TLS property to enable and enforce termination. The certificate must include both
Self-signed certificates are not supported unless trusted fully by clients. Accepting an invalid certificate manually can cause connection failures for unknown subdomains.Set these only if you are not terminating TLS externally. For details on certificate requirements and enabling TLS termination, see Enabling TLS/SSL for Cloudera Data Science Workbench.
If your organization uses an internal custom Certificate Authority, you can use this field to paste in the contents of your internal CA's root certificate file.
The contents of this field are then inserted into the engine's root certificate store every time a session (or any workload) is launched. This allows processes inside the engine to communicate securely with the ingress controller.
HTTP Proxy, HTTPS ProxyIf your deployment is behind an HTTP or HTTPS proxy, set the respective HTTP Proxy or HTTPS Proxy property to the hostname of the proxy you are using.
If you are using an intermediate proxy such as Cntlm to handle NTLM authentication, add the Cntlm proxy address to the HTTP Proxy or HTTPS Proxy fields. That is, either
If the proxy server uses TLS encryption to handle connection requests, you will need to add the proxy's root CA certificate to your host's store of trusted certificates. This is because proxy servers typically sign their server certificate with their own root certificate. Therefore, any connection attempts will fail until the Cloudera Data Science Workbench host trusts the proxy's root CA certificate. If you do not have access to your proxy's root certificate, contact your Network / IT administrator.To enable trust, copy the proxy's root certificate to the trusted CA certificate store (
ca-trust) on the Cloudera Data Science Workbench host.
cp /tmp/<proxy-root-certificate>.crt /etc/pki/ca-trust/source/anchors/Use the following command to rebuild the trusted certificate store.
SOCKS Proxy: If a SOCKS proxy is in use, set this parameter to
No Proxy: Comma-separated list of hostnames that should be skipped from the proxy.
Starting with version 1.4, if you have defined a proxy in the
ALL_PROXYproperties, Cloudera Data Science Workbench automatically appends the following list of IP addresses to the
NO_PROXYconfiguration. Note that this is the minimum required configuration for this field.
This list includes
localhost, and any private Docker registries and HTTP services inside the firewall that Cloudera Data Science Workbench users might want to access from the engines.
"127.0.0.1,localhost,100.66.0.1,100.66.0.2,100.66.0.3, 100.66.0.4,100.66.0.5,100.66.0.6,100.66.0.7,100.66.0.8,100.66.0.9, 100.66.0.10,100.66.0.11,100.66.0.12,100.66.0.13,100.66.0.14, 100.66.0.15,100.66.0.16,100.66.0.17,100.66.0.18,100.66.0.19, 100.66.0.20,100.66.0.21,100.66.0.22,100.66.0.23,100.66.0.24, 100.66.0.25,100.66.0.26,100.66.0.27,100.66.0.28,100.66.0.29, 100.66.0.30,100.66.0.31,100.66.0.32,100.66.0.33,100.66.0.34, 100.66.0.35,100.66.0.36,100.66.0.37,100.66.0.38,100.66.0.39, 100.66.0.40,100.66.0.41,100.66.0.42,100.66.0.43,100.66.0.44, 100.66.0.45,100.66.0.46,100.66.0.47,100.66.0.48,100.66.0.49, 100.66.0.50,100.77.0.10,100.77.0.128,100.77.0.129,100.77.0.130, 100.77.0.131,100.77.0.132,100.77.0.133,100.77.0.134,100.77.0.135, 100.77.0.136,100.77.0.137,100.77.0.138,100.77.0.139"
Enable GPU Support: When this property is enabled, GPUs installed on Cloudera Data Science Workbench hosts will be available for use in its workloads. By default, this parameter is disabled.
For instructions on how to enable GPU-based workloads on Cloudera Data Science Workbench, see Configuring Custom Root CA Certificate.
- Cloudera Manager will prompt you to restart the service if needed.
If the release you have just upgraded to includes a new version of the base engine
image (see release notes), you will need to manually configure existing projects to use
the new engine. Cloudera recommends you do so to take advantage of any new features and
bug fixes included in the newly released engine. For example:
To upgrade a project to the new engine, go to the project's Settings > Engine page and select the new engine from the dropdown. If any of your projects are using custom extended engines, you will need to modify them to use the new base engine image.
- Container Security
Security best practices dictate that engine containers should not run as the root user. Engines (v7 and lower) briefly initialize as the root user and then run as the cdsw user. Engines v8 (and higher) now follow the best practice and run only as the cdsw user. For more details, see Restricting User-Created Pods.
- CDH 6 Compatibility
The base engine image you use must be compatible with the version of CDH you are running. This is especially important if you are running workloads on Spark. Older base engines (v6 and lower) cannot support the latest versions of CDH 6. If you want to run Spark workloads on CDH 6, you must upgrade your projects to base engine 7 (or higher).
Engines v8 (and higher) ships with the browser-based IDE, Jupyter, preconfigured and can be selected from the Start Session menu.
- Container Security
(GPU-enabled Deployments) Remove nvidia-docker1 and Upgrade NVIDIA Drivers to 410.xx
Perform the following steps to make sure you can continue to leverage GPUs for workloads on Cloudera Data Science Workbench 1.6 (and higher).
nvidia-docker1. Cloudera Data Science Workbench (version 1.6 and higher) ships with
nvidia-docker2installed by default.Perform this step on all hosts that have GPUs attached to them.
Upgrade your NVIDIA driver to version 410.xx (or higher). This must be done because
nvidia-docker2 does not support lower versions of NVIDIA drivers.
- Stop Cloudera Data Science Workbench.
Depending on your deployment, either stop the CDSW service in Cloudera Manager (for CSDs) or run cdsw stop on the Master host (for RPMs).
- Reboot the GPU-enabled hosts. Install a supported version of the NVIDIA driver (410.xx or higher) on all GPU-enabled hosts.
- Start Cloudera Data Science Workbench.
Depending on your deployment, either start the CDSW service in Cloudera Manager (for CSDs) or run cdsw start on the Master host (for RPMs).
- Stop Cloudera Data Science Workbench.