Adding Files to an Encryption Zone

You can add files to an encryption zone by copying them to the encryption zone using distcp.

For example:
sudo -u hdfs hadoop distcp /user/dir /encryption_zone

DistCp Considerations

A common use case for DistCp is to replicate data between clusters for backup and disaster recovery purposes. This is typically performed by the cluster administrator, who is an HDFS superuser. To retain this workflow when using HDFS encryption, the virtual path prefix /.reserved/raw/ has been introduced, that gives superusers direct access to the underlying block data in the filesystem. This allows superusers to distcp data without requiring access to encryption keys, and avoids the overhead of decrypting and re-encrypting data. It also means the source and destination data will be byte-for-byte identical, which would not have been true if the data was being re-encrypted with a new EDEK.

Copying data from encrypted locations

By default, distcp compares checksums provided by the filesystem to verify that data was successfully copied to the destination. When copying from an encrypted location, the file system checksums will not match because the underlying block data is different. This is true whether or not the destination location is encrypted or unencrypted.

In this case, you can specify the -skipcrccheck and -update flags to avoid verifying checksums. When you use -skipcrccheck, distcp checks the file integrity by performing a file size comparison, right after the copy completes for each file.