Configuring Cloudera Data Science Workbench Deployments Behind a Proxy

If your deployment is behind an HTTP or HTTPS proxy, you must configure the hostname of the proxy you are using in Cloudera Data Science Workbench. If you are on a CSD-based deployment, set the HTTP Proxy and HTTPS Proxy properties in the Cloudera Manager's CDSW service. On RPM deployments, set the HTTP_PROXY or HTTPS_PROXY properties in /etc/cdsw/config/cdsw.conf as follows.
If you are using an intermediate proxy such as Cntlm to handle NTLM authentication, add the Cntlm proxy address to the HTTP Proxy or HTTPS Proxy fields.

If the proxy server uses TLS encryption to handle connection requests, you will need to add the proxy's root CA certificate to your host's store of trusted certificates. This is because proxy servers typically sign their server certificate with their own root certificate. Therefore, any connection attempts will fail until the Cloudera Data Science Workbench host trusts the proxy's root CA certificate. If you do not have access to your proxy's root certificate, contact your Network / IT administrator.

To enable trust, perform the following steps on the master and worker nodes.
  1. Copy the proxy's root certificate to the trusted CA certificate store (ca-trust) on the Cloudera Data Science Workbench host.
    cp /tmp/<proxy-root-certificate>.crt /etc/pki/ca-trust/source/anchors/
  2. Use the following command to rebuild the trusted certificate store.
    update-ca-trust extract
  3. If you will be using custom engine images that will be pulled from a Docker repository, add the proxy's root certificates to a directory under /etc/docker/certs.d. For example, if your Docker repository is at, create the following directory structure:
    |--          # Directory named after Docker repository 
        |-- <proxy-root-certificate>.crt         # Docker-related root CA certificates 

    This step is not required with the standard engine images because they are included in the Cloudera Data Science Workbench RPM.

  4. Re-initialize Cloudera Data Science Workbench to have this change go into effect.
    cdsw init

Configure hostnames to be skipped from the proxy

Use the Cloudera Manager CDSW service's No Proxy property to configure a comma-separated list of hostnames that should be skipped from the proxy. On an RPM deployment, you would configure the corresponding NO_PROXY field in cdsw.conf.

The value for this field typically includes, localhost, the Master node IP address (configured as part of the installation process), and any private Docker registries and HTTP services inside the firewall that Cloudera Data Science Workbench users might want to access from the engines. This change must be made on the master and on all the worker nodes.

At a minimum, Cloudera recommends the following No Proxy configuration.,localhost,<CDSW_MASTER_NODE_IP>,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,