This is the documentation for CDH 5.1.x. Documentation for other versions is available at Cloudera Documentation.

SSH and HTTPS in the Hadoop Cluster

SSH and HTTPS can be used to transmit information securely:

  • SSH (Secure Shell) is a secure shell that usually runs on top of SSL and has a built-in username/password authentication scheme that can be used for secure access to a remote host; it is a more secure alternative to rlogin and telnet.
  • HTTPS (HTTP Secure) is HTTP running on top of SSL, adding security to standard HTTP communications.

SSH

It is a good idea to use SSH for remote administration purposes (instead of rlogin, for example). But note that it is not used to secure communication among the elements in a Hadoop cluster (DataNode, NameNode, TaskTracker or YARN ResourceManager, JobTracker or YARN NodeManager, or the /etc/init.d scripts that start daemons locally).

The Hadoop components use SSH in the following cases:

  • The sshfencer component of High Availability Hadoop configurations uses SSH; the shell fencing method does not require SSH.
  • Whirr uses SSH to enable secure communication with the Whirr cluster in the Cloud. See the Whirr Installation instructions.

HTTPS

Some communication within Hadoop can be configured to use HTTPS. Implementing this requires generating valid certificates and configuring clients to use those certificates. The HTTPS functionality that can be configured in CDH 5 is:

  • Encrypted MapReduce Shuffle (both MRv1 and YARN).
  • Encrypted Web UIs; the same configuration parameters that enable Encrypted MapReduce Shuffle implement Encrypted Web UIs.

These features are discussed under Configuring Encrypted Shuffle, Encrypted Web UIs, and Encrypted HDFS Transport.

Page generated September 3, 2015.