Replication of Encrypted Data
HDFS supports encryption of data at rest (including data accessed through Hive). This topic describes how replication works within and between encryptions zones and how to configure replication to avoid failures due to encryption.
Encrypting Data in Transit Between Clusters
A source directory and destination directory may or may not be in an encryption zone. If the destination directory is in an encryption zone, the data on the destination directory is encrypted. If the destination directory is not in an encryption zone, the data on that directory is not encrypted, even if the source directory is in an encryption zone. For more information about HDFS encryption zones, see HDFS Transparent Encryption. Encryption zones are not supported in CDH versions 5.1 or lower.
Even when the source and destination directories are both in encryption zones, the data is decrypted as it is read from the source cluster (using the key for the source encryption zone) and encrypted again when it is written to the destination cluster (using the key for the destination encryption zone). By default, it is transmitted as plain text.
- Enable TLS/SSL for HDFS clients on both the source and the destination clusters. For instructions, see Configuring TLS/SSL for HDFS. You may also need to configure trust between the SSL certificates on the source and destination.
- Enable TLS/SSL for the two peer Cloudera Manager Servers as described here: Configuring TLS Encryption Only for Cloudera Manager.
- Cloudera recommends you also enable TLS/SSL communication between the Cloudera Manager Server and Agents. See Configuring TLS Security for Cloudera Manager for instructions.