Encrypting Data in Transit Between Clusters
A source directory and destination directory may or may not be in an encryption zone. If the destination directory is in an encryption zone, the data on the destination directory is encrypted.
If the destination directory is not in an encryption zone, the data on that directory is not encrypted, even if the source directory is in an encryption zone. Encryption zones are not supported in CDH versions 5.1 or lower.
When you configure encryption zones, you also configure a Key Management Server (KMS) to manage encryption keys. During replication, Cloudera Manager uses TLS/SSL to encrypt the keys when they are transferred from the source cluster to the destination cluster.
When you configure encryption zones, you also configure a Key Management Server (KMS) to manage encryption keys. When a HDFS replication command that specifies an encrypted source directory runs, Cloudera Manager temporarily copies the encryption keys from the source cluster to the destination cluster, using TLS/SSL (if configured for the KMS) to encrypt the keys. Cloudera Manager then uses these keys to decrypt the encrypted files when they are received from the source cluster before writing the files to the destination cluster.
Even when the source and destination directories are both in encryption zones, the data is decrypted as it is read from the source cluster (using the key for the source encryption zone) and encrypted again when it is written to the destination cluster (using the key for the destination encryption zone). The data transmission is encrypted if you have configured encryption for HDFS Data Transfer.
During replication, data travels from the source cluster to the destination cluster using
distcp
. For clusters that use encryption zones, configure encryption of KMS
key transfers between the source and destination using TLS/SSL. See Configuring TLS/SSL for the KMS.
- Enable TLS/SSL for HDFS clients on both the source and the destination clusters. For instructions, see Configuring TLS/SSL for HDFS, YARN, and MapReduce. You may also need to configure trust between the SSL certificates on the source and destination.
- Enable TLS/SSL for the two peer Cloudera Manager Servers. See Manually Configuring TLS Encryption for Cloudera Manager.
- Encrypt data transfer using HDFS Data Transfer Encryption. See Configuring Encrypted Transport for HDFS.
The following blog post provides additional information about encryption with HDFS: http://blog.cloudera.com/blog/2013/03/how-to-set-up-a-hadoop-cluster-with-network-encryption/.