Hadoop Security Guide
Also available as:
PDF
loading table of contents...

Chapter 3. Data Protection: Wire Encryption

Encryption is applied to electronic information to ensure its privacy and confidentiality. Wire encryption protects data as it moves into, through, and out of an Hadoop cluster over RPC, HTTP, Data Transfer Protocol (DTP), and JDBC:

  • Clients typically communicate directly with the Hadoop cluster. Data can be protected using RPC encryption or Data Transfer Protocol:

    • RPC encryption: Clients interacting directly with the Hadoop cluster through RPC. A client uses RPC to connect to the NameNode (NN) to initiate file read and write operations. RPC connections in Hadoop use Java’s Simple Authentication & Security Layer (SASL), which supports encryption.

    • Data Transfer Protocol: The NN gives the client the address of the first DataNode (DN) to read or write the block. The actual data transfer between the client and a DN uses Data Transfer Protocol.

  • Users typically communicate with the Hadoop cluster using a Browser or a command line tools, data can be protected as follows:

    • HTTPS encryption: Users typically interact with Hadoop using a browser or component CLI, while applications use REST APIs or Thrift. Encryption over the HTTP protocol is implemented with the support for SSL across a Hadoop cluster and for the individual components such as Ambari.

    • JDBC: HiveServer2 implements encryption with Java SASL protocol’s quality of protection (QOP) setting. With this the data moving between a HiveServer2 over jdbc and a jdbc client can be encrypted.

  • Additionally, within-cluster communication between processes can be protected using HTTPS encryption during MapReduce shuffle:

    • HTTPS encryption during shuffle: When data moves between the Mappers and the Reducers over the HTTP protocol, this step is called shuffle. Reducer initiates the connection to the Mapper to ask for data; it acts as an SSL client.

This chapter provides information about configuring and connecting to wire-encrypted components.

For information about configuring HDFS data-at-rest encryption, see HDFS "Data at Rest" Encryption.