Enabling TLS/SSL for Cloudera Data Science Workbench

Cloudera Machine Learning uses HTTP and WebSockets (WS) to support interactive connections to the Cloudera Machine Learning web application. However, these connections are not secure by default. Secure these connections with TLS.

Starting with version 1.6, Cloudera Machine Learning defaults to using TLS 1.2. The default cipher suites have also been upgraded to Mozilla's Modern cipher suites.

Cloudera Machine Learning can be configured to use a TLS termination proxy to handle incoming connection requests. The termination proxy server will decrypt incoming connection requests and forward them to the Cloudera Machine Learning web application. A TLS termination proxy can be internal or external.

Internal Termination

An internal termination proxy will be run by Cloudera Data Science Workbench's built-in load balancer, called the ingress controller, on the master host. The ingress controller is primarily responsible for routing traffic and load balancing between Cloudera Data Science Workbench's web service backend. Once configured, as shown in the instructions that follow, it will start terminating HTTPS traffic as well. The primary advantage of internal termination approach is simplicity.

External Termination

External TLS termination can be provided through a number of different approaches. Common examples include:
  • Load balancers, such as the AWS Elastic Load Balancer
  • Modern firewalls
  • Reverse web proxies, such as nginx
  • VPN appliances supporting TLS/SSL VPN

Organizations that require external termination will often have standardized on single approach for TLS. The primary advantage of this approach is that it allows such organizations to integrate with Cloudera Machine Learning without violating their IT department's policies for TLS. For example, with an external termination proxy, Cloudera Machine Learning does not need access to the TLS private key.

Load balancers and proxies often require a URL they can ping to validate the status of the web service backend. For instance, you can configure a load balancer to send an HTTP GET request to /internal/load-balancer/health-ping. If the response is 200 (OK), that means the backend is healthy. Note that, as with all communication to the web backend from the load balancer when TLS is terminated externally, this request should be sent over HTTP and not HTTPS.

Note that any terminating load balancer must provide the following header fields so that Cloudera Machine Learning can detect the IP address and protocol used by the client:

  • X-Forwarded-For (client's IP address),
  • X-Forwarded-Proto (client's requested protocol, i.e. HTTPS),
  • X-Forwarded-Host (the "Host" header of the client's original request).

See Configuring HTTP Headers for Cloudera Machine Learning for more details on how to customize HTTP headers required by Cloudera Data Science Workbench.