Encrypting data in transit between clusters

A source directory and destination directory may or may not be in an encryption zone. If the destination directory is in an encryption zone, the data on the destination directory is encrypted. If the destination directory is not in an encryption zone, the data on that directory is not encrypted, even if the source directory is in an encryption zone. Encryption zones are not supported in CDH versions 5.1 or lower.

When you configure encryption zones, you also configure Ranger Key Management Server (KMS) to manage encryption keys. To access encrypted data, the user must be authorized on the KMS for the encryption zones they need to interact with. The user you specify in the General > Run As Username field during the HDFS replication policy creation process must have this authorization. The key administrator must add ACLs to the KMS for that user to prevent authorization failure. During replication, data travels from the source cluster to the destination cluster using DistCp. For clusters that use encryption zones, configure encryption of KMS key transfers between the source and destination using TLS/SSL protocol.

You might come across the following three scenarios when using encryption zones:

Scenario Steps taken by Replication Manager to replicate data
Replicating data from an encrypted zone on the source cluster to an encrypted zone on a destination cluster.
  1. Data is decrypted at source as it is read from the source cluster (using the key for the source encryption zone).
  2. The (decrypted) data is transferred on wire using DistCp through TLS/SSL protocol.
  3. The data is encrypted when it is written to the destination cluster (using the key for the destination encryption zone).

The data transmission is encrypted only if you have configured encryption for HDFS data transfer.

Replicating from an encryption zone on the source cluster to an unencrypted zone on the destination cluster.
  1. Data is decrypted at source as it is read from the source cluster (using the key for the source encryption zone).
  2. The (decrypted) data is transferred on wire using DistCp through TLS/SSL protocol.
  3. The data remains unencrypted.
Replicating from an unencrypted zone on the source cluster to an encrypted zone on the destination cluster. The data is available as is after replication.
To configure encryption of data transmission between source and destination clusters:
  • Enable TLS/SSL for HDFS clients on both the source and the destination clusters. You may also need to configure trust between the SSL certificates on the source and destination.
  • Enable TLS/SSL for the two peer Cloudera Manager Servers.
  • Encrypt data transfer using HDFS data transfer encryption.

The following blog post provides additional information about encryption with HDFS: https://blog.cloudera.com/blog/2013/03/how-to-set-up-a-hadoop-cluster-with-network-encryption/.