Copy Files from/to an Encryption Zone
To copy existing files into an encryption zone, use a tool like distcp
.
Note: for separation of administrative roles,
do not use the hdfs
user to create encryption zones. Instead, designate
another administrative account for creating encryption keys and zones. See
Creating an HDFS Admin User for more information.
The files will be encrypted using a file-level key generated by the Ranger Key Management Service.
DistCp Considerations
DistCp
is commonly used to replicate data between clusters for backup and
disaster recovery purposes. This operation is typically performed by the cluster
administrator, via an HDFS superuser account.
To retain this workflow when using HDFS encryption, a new virtual path prefix has
been introduced, /.reserved/raw/
. This virtual path gives super users
direct access to the underlying encrypted block data in the file system, allowing
super users to distcp
data without requiring access to encryption keys.
This also avoids the overhead of decrypting and re-encrypting data. The source and
destination data will be byte-for-byte identical, which would not be true if the
data were re-encrypted with a new EDEK.
Warning | |
---|---|
When using
This means that if the Recommendation: To avoid potential mishaps, first create identical encryption zones on the destination cluster. |
Copying between encrypted and unencrypted locations
By default, distcp
compares file system checksums to verify that data was
successfully copied to the destination.
When copying between an unencrypted and encrypted location, file system checksums
will not match because the underlying block data is different. In this case, specify
the -skipcrccheck
and -update
flags to avoid verifying
checksums.