Network design

In most situations it will be appropriate to place the CDP cluster within a VLAN of its own and whitelist a number of ports that will be used for inbound connections. This allows for a simplified network configuration for intra-cluster communications, whilst protecting the cluster from external access.

East West traffic on a typical cluster can be extremely high, therefore it’s important that a leaf spine network design be used without inferring firewalls between cluster nodes and racks in order not to compromise application performance.

In theory, this level of control might not be necessary given that all ports are protected by one of the authentication mechanisms specified in Kerberos and Identity, however a typical deployment would involve a VLAN.

Isolating Clusters within the same VLAN

Due consideration should be given to how to isolate two clusters that have network access to each other and are housed in the same Kerberos domain. By default, Hadoop does not differentiate between clusters who present the same prefix of a three part Kerberos principal. For example, consider the following service principal names:

  • hdfs/myprodhost1.example.com@EXAMPLEREALM.COM

  • hdfs/mydevhost1.example.com@EXAMPLEREALM.COM

Using an out of the box configuration, the cluster will extract the first part of the SPN (before the / character) and use that as the service shortname, meaning that there is no way of differentiating between the hdfs user on a production cluster and that on a development cluster.

In order to counteract this, Cloudera recommends using Auth-to-Local rules to isolate any clusters that have network connectivity and are in the same realm, or are trusted by the cluster. For further information on how to configure Auth-to-Local Rules

Note: Using Custom Kerberos Principals (see, Customizing Kerberos principals) is generally not recommended, as support is limited due to environment specific testing required.

Edge and Gateway Roles

In a standard configuration it is recommended that Apache Knox be used as a Gateway and Load Balancer for all interactions into the cluster, with the exception of Cloudera Manager and SSH access. SSH access should be restricted to administrators only, other than Edge Nodes, to which end users may be permitted non-root access.