Networking and Security Requirements

It is important to note networking and security requirements for Cloudera Data Science Workbench.

  • All Cloudera Data Science Workbench gateway hosts must be part of the same datacenter and use the same network. Hosts from different data-centers or networks can result in unreliable performance.
  • A wildcard subdomain such as *.cdsw.company.com must be configured. Wildcard subdomains are used to provide isolation for user-generated content.

    The wildcard DNS hostname configured for Cloudera Data Science Workbench must be resolvable from both, the CDSW cluster, and your browser.

  • Disable all pre-existing iptables rules. While Kubernetes makes extensive use of iptables, it’s difficult to predict how pre-existing iptables rules will interact with the rules inserted by Kubernetes. Therefore, Cloudera recommends you to disable all pre-existing rules before you proceed with the installation.
    It is recommended to save the iptables and check whether the changes have been written to the /etc/sysconfig/iptables file before you disable them. If you disable the iptables without saving, then the settings can get erased upon system reboot.
    1. Save the iptables by running the following command:
      service iptables save
    2. Verify whether the changes have been written to the file by running the following command:
      ls -l /etc/sysconfig/iptables
    3. Disable the iptables by running the following commands:
      sudo iptables -P INPUT ACCEPT
      sudo iptables -P FORWARD ACCEPT
      sudo iptables -P OUTPUT ACCEPT
      sudo iptables -t nat -F
      sudo iptables -t mangle -F
      sudo iptables -F
      sudo iptables -X
  • Cloudera Data Science Workbench sets the following sysctl options in /etc/sysctl.d/k8s.conf:
    • net.bridge.bridge-nf-call-iptables=1
    • net.bridge.bridge-nf-call-ip6tables=1
    • net.ipv4.ip_forward=1
    • net.ipv4.conf.default.forwarding=1
    Underlying components of Cloudera Data Science Workbench (Docker, Kubernetes, and NFS) require these options to work correctly. Make sure they are not overridden by high-priority configuration such as /etc/sysctl.conf.
  • SELinux must either be disabled or run in permissive mode.
  • Multi-homed networks are supported with Cloudera Data Science Workbench 1.2.2 (and higher). However, you will need to explicitly configure the private IP address of the worker nodes in the kubelet start script as follows:
    # vi /opt/cloudera/parcels/CDSW/scripts/start-kubelet-worker-standalone-core.sh
    88 kubelet_opts+=(--v=2)
    89 kubelet_opts+=(--node-ip=172.x.x.x)
  • Firewall restrictions must be disabled across Cloudera Data Science Workbench and CDH/HDP cluster hosts. For more details on cluster communication, see Ports Required by Cloudera Data Science Workbench.
  • Untrusted (non-sudo) SSH access to Cloudera Data Science Workbench hosts must be disabled to ensure a secure deployment.

    Cloudera Data Science Workbench assumes that users only access the gateway hosts through the web application. Untrusted users with SSH access to a Cloudera Data Science Workbench host can gain full access to the cluster, including access to other users' workloads.

  • localhost must resolve to 127.0.0.1.
  • Forward and reverse DNS lookup must be enabled for the Cloudera Data Science Workbench domain name and IP address (CDSW master host).
  • Cloudera Data Science Workbench does not support DNS servers running on 127.0.0.1:53. This IP address resolves to the container localhost within Cloudera Data Science Workbench containers. As a workaround, use either a non-loopback address or a remote DNS server.
  • All third-party security software (such as McAfee, Tanium, Symantec, etc.) must be disabled on CDSW hosts before starting or restarting CDSW. Failure to do so can result in Cloudera Data Science Workbench failing randomly. After CDSW is started, you should be able to re-enable the security software.

Cloudera Data Science Workbench does not support hosts or clusters that do not conform to these restrictions.