Network and Security Requirements

Cloudera Data Science Workbench on HDP has the following network and security requirements.

  • Cloudera Data Science Workbench requires DNS to resolve all hostnames in your CDH cluster. CDSW does not allow using /etc/hosts for this.
  • (New in Cloudera Data Science Workbench 1.6.x and higher) Enable IPv6 on all Cloudera Data Science Workbench gateway hosts. For instructions, refer the workaround provided here: Known Issue: CDSW cannot start sessions due to connection errors.
  • All Cloudera Data Science Workbench gateway hosts must be part of the same datacenter and use the same network. Hosts from different data-centers or networks can result in unreliable performance.
  • A wildcard subdomain such as *.cdsw.company.com must be configured. Wildcard subdomains are used to provide isolation for user-generated content.

    Starting with version 1.5, the wildcard DNS hostname configured for Cloudera Data Science Workbench must now be resolvable from both, the CDSW cluster, and your browser.

  • Disable all pre-existing iptables rules. While Kubernetes makes extensive use of iptables, it’s difficult to predict how pre-existing iptables rules will interact with the rules inserted by Kubernetes. Therefore, Cloudera recommends you use the following commands to disable all pre-existing rules before you proceed with the installation.
    sudo iptables -P INPUT ACCEPT
    sudo iptables -P FORWARD ACCEPT
    sudo iptables -P OUTPUT ACCEPT
    sudo iptables -t nat -F
    sudo iptables -t mangle -F
    sudo iptables -F
    sudo iptables -X
  • Cloudera Data Science Workbench sets the following sysctl options in /etc/sysctl.d/k8s.conf:
    • net.bridge.bridge-nf-call-iptables=1
    • net.bridge.bridge-nf-call-ip6tables=1
    • net.ipv4.ip_forward=1
    • net.ipv4.conf.default.forwarding=1
    Underlying components of Cloudera Data Science Workbench (Docker, Kubernetes, and NFS) require these options to work correctly. Make sure they are not overridden by high-priority configuration such as /etc/sysctl.conf.
  • SELinux must either be disabled or run in permissive mode.
  • Multi-homed networks are supported with Cloudera Data Science Workbench 1.2.2 (and higher).
  • Firewall restrictions must be disabled across Cloudera Data Science Workbench and HDP hosts. Internally, the Cloudera Data Science Workbench master and worker hosts require full connectivity with no firewalls. Externally, end users connect to Cloudera Data Science Workbench exclusively through a web server running on the master host, and therefore do not need direct access to any other internal Cloudera Data Science Workbench or HDP services.

    Review the complete list of ports required by Cloudera Data Science Workbench at Ports Uses By Cloudera Data Science Workbench.

  • Untrusted (non-sudo) SSH access to Cloudera Data Science Workbench hosts must be disabled to ensure a secure deployment.

    Cloudera Data Science Workbench assumes that users only access the gateway hosts through the web application. Untrusted users with SSH access to a Cloudera Data Science Workbench host can gain full access to the cluster, including access to other users' workloads.

  • localhost must resolve to 127.0.0.1.
  • Forward and reverse DNS lookup must be enabled for the Cloudera Data Science Workbench domain name and IP address (CDSW master host).
  • Cloudera Data Science Workbench does not support DNS servers running on 127.0.0.1:53. This IP address resolves to the container localhost within Cloudera Data Science Workbench containers. As a workaround, use either a non-loopback address or a remote DNS server.

Cloudera Data Science Workbench does not support hosts or clusters that do not conform to these restrictions.