Copying Files to or from an Encryption Zone
Information on how to copy existing files to or from an encryption zone, use a tool like distcp.
Note: for separation of administrative roles, do not use the
hdfs
user to create encryption zones. Instead, designate another
administrative account for creating encryption keys and zones. See “Appendix: Creating
an HDFS Admin User” for more information.
The files will be encrypted using a file-level key generated by the Ranger Key Management Service.
DistCp Considerations
DistCp
is commonly used to replicate data between clusters for backup
and disaster recovery purposes. This operation is typically performed by the cluster
administrator, via an HDFS superuser account.
To retain this workflow when using HDFS encryption, a new virtual path prefix has been
introduced, /.reserved/raw/
. This virtual path gives super users direct
access to the underlying encrypted block data in the file system, allowing super users
to distcp
data without requiring access to encryption keys. This also
avoids the overhead of decrypting and re-encrypting data. The source and destination
data will be byte-for-byte identical, which would not be true if the data were
re-encrypted with a new EDEK.
Note | |
---|---|
When using
This means that if the Recommendation: To avoid potential mishaps, first create identical encryption zones on the destination cluster. |
Copying between encrypted and unencrypted locations
By default, distcp
compares file system checksums to verify that data
was successfully copied to the destination.
When copying between an unencrypted and encrypted location, file system checksums will
not match because the underlying block data is different. In this case, specify the
-skipcrccheck
and -update
flags to avoid verifying
checksums.