Managing Encryption Keys and Zones
Interacting with the KMS and creating encryption zones requires the
use of two new CLI commands: hadoop key
and hdfs crypto
. The
following sections will help you get started with creating encryption keys and setting up
encryption zones.
Before continuing, make sure that your KMS ACLs have been set up according to best practices.
For more information, see Configuring KMS Access Control Lists (ACLs)
.
Validating Hadoop Key Operations
hadoop key create
to create a test key, and then use hadoop
key list
to retrieve the key
list:sudo -u <key_admin> hadoop key create keytrustee_test
hadoop key list
Creating Encryption Zones
Once a KMS has been set up and the NameNode and HDFS clients have been correctly
configured, use the hadoop key
and hdfs crypto
command-line tools to create encryption keys and set up new encryption zones.
- Create an encryption key for your zone as
keyadmin
for the user/group (regardless of the application that will be using the encryption zone):sudo -u hdfs hadoop key create <key_name>
- Create a new empty directory and make it an encryption zone using
the key created
above.
You can verify creation of the new encryption zone by running thesudo -u hdfs hadoop fs -mkdir /encryption_zone sudo -u hdfs hdfs crypto -createZone -keyName <key_name> -path /encryption_zone
-listZones
command. You should see the encryption zone along with its key listed as follows:$ sudo -u hdfs hdfs crypto -listZones /encryption_zone <key_name>
For more information and recommendations on creating encryption zones for each CDP
component, see Configuring CDP Services for HDFS Encryption
.
Adding Files to an Encryption Zone
distcp
. For
example:sudo -u hdfs hadoop distcp /user/dir /encryption_zone
DistCp Considerations
A common use case for DistCp is to replicate data between clusters for backup and
disaster recovery purposes. This is typically performed by the cluster administrator, who
is an HDFS superuser. To retain this workflow when using HDFS encryption, a new virtual
path prefix has been introduced, /.reserved/raw/
, that gives superusers
direct access to the underlying block data in the filesystem. This allows superusers to
distcp
data without requiring access to encryption keys, and avoids the
overhead of decrypting and re-encrypting data. It also means the source and destination
data will be byte-for-byte identical, which would not have been true if the data was being
re-encrypted with a new EDEK.
Copying data from encrypted locations
By default, distcp
compares checksums provided by
the filesystem to verify that data was successfully copied to the
destination. When copying from an encrypted location, the file
system checksums will not match because the underlying block data is
different. This is true whether or not the destination location is
encrypted or unencrypted.
In this case, you can specify the -skipcrccheck
and
-update
flags to avoid verifying checksums. When you use
-skipcrccheck, distcp
checks the file integrity by performing a file
size comparison, right after the copy completes for each file.
Deleting Encryption Zones
sudo -u hdfs hadoop fs -rm -r -skipTrash /encryption_zone
Backing Up Encryption Keys
If you are using the Java KeyStore KMS, make sure you regularly back up the Java KeyStore
that stores the encryption keys. If you are using the Key Trustee KMS and Key Trustee
Server, see Backing up Key Trustee Server and Clients
for instructions on backing up
Key Trustee Server and Key Trustee KMS.
Rolling Encryption Keys
Before attempting to roll an encryption key (also known as an encryption zone key, or EZ
key), familiarize yourself with the concepts described in HDFS Transparent
Encryption
, as the material in these sections presumes you are familiar with the
fundamentals of HDFS transparent encryption and Cloudera data at rest encryption.
When you roll an EZ key, you are essentially creating a new version of the key
(ezKeyVersionName
). Rolling EZ keys regularly helps enterprises minimize
the risk of key exposure. If a malicious attacker were to obtain the EZ key and decrypt
encrypted data encryption keys (EDEKs) into DEKs, they could gain the ability to decrypt
HDFS files. Rolling an EZ key ensures that all DEKs for newly-created files will be
encrypted with the new version of the EZ key. The older EZ key version that the attacker
obtained cannot decrypt these EDEKs. You may want to roll the encryption key periodically,
as part of your security policy or when an external security compromise is detected.
- (Optional) Before rolling any keys, log in as HDFS Superuser and
verify/identify the encryption zones to which the current key applies. This operation
also helps clarify the relationship between the EZ key and encryption zones, and, if
necessary, makes it easier to identify more important, high priority
zones:
The first column identifies the encryption zone paths; the second column identifies the encryption key name.$ hdfs crypto –listZones /ez key1 /ez2 key2 /user key1
- (Optional) You can verify that the files inside an encryption zone are
encrypted using the
hdfs crypto -getFileEncryptionInfo
command. Note the EZ key version name and value, which you can use for comparison and verification after rolling the EZ key.
Log off as HDFS Superuser.$ hdfs crypto –getFileEncryptionInfo –path /ez/f {cipherSuite: {name: AES/CTR/NoPadding, algorithmBlockSize: 16}. cryptoProtocolVersion: CryptoProtocolVersion{description=’Encryption zones’, version=2, unknownValue=null}, edek: 373c0c2e919c27e58c1c343f54233cbd, iv: d129c913c8a34cde6371ec95edfb7337, keyName: key1, ezKeyVersionName: 7mbvopZ0Weuvs0XtTkpGw3G92KuWc4e4xcTXl0bXCpF}
- Log in as Key Administrator. Because keys can be rolled, a key can have multiple key
versions, where each key version has its own key material (the actual secret bytes used
during DEK encryption and EDEK decryption). You can fetch an encryption key by either
its key name, returning the latest version of the key, or by a specific key
version.Roll the encryption key (previously identified/confirmed by the HDFS Superuser in step 1. Here, the
<key name>
is key1:
This operation contacts the KMS and rolls the keys there. Note that this can take a considerable amount of time, depending on the number of key versions residing in the KMS.hadoop key roll key1
Rolling key version from KeyProvider: org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider@5ea434c8 for keyName: key1 key1 has been successfully rolled. org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider@5ea434c8 has been updated.
- (Optional) Log out as Key Administrator, and log in as HDFS Superuser.
Verify that new files in the encryption zone have a new EZ key version.
Alternatively, you can use KMS Rest API to view key metadata and key versions. Elements appearing in brackets should be replaced with your actual values. So in this case, before rolling a key, you can view the key metadata and versions as follows:$ hdfs crypto –getFileEncryptionInfo –path /ez/new_file {cipherSuite: {name: AES/CTR/NoPadding, algorithmBlockSize: 16}. cryptoProtocolVersion: CryptoProtocolVersion{description=’Encryption zones’, version=2, unknownValue=null}, edek: 9aa13ea4a700f96287cfe1349f6ff4f2, iv: 465c878ad9325e42fa460d2a22d12a72, keyName: key1, ezKeyVersionName: 4tuvorJ6Feeqk8WiCfdDs9K32KuEj7g2ydCAv0gNQbY}
$ curl -k --negotiate -u: "https://<KMS_HOSTNAME>:16000/kms/v1/key/<key-name>/_metadata" { "name" : "<key-name>", "cipher" : "<cipher>", "length" : <length>, "description" : "<decription>", "created" : <millis-epoc>, "versions" : <versions> (For example, 1) } $ curl -k --negotiate -u: "https://<KMS_HOSTNAME>:16000/kms/v1/key/<key-name>/_currentversion" { "material" : "<material>", "name" : "<key-name>", "versionName" : "<versionName>" (For example, version 1) }
$ hadoop key roll key1 Rolling key version from KeyProvider: KMSClientProvider[https://<KMS_HOSTNAME>:16000/kms/v1/] for key name: <key-name> key1 has been successfully rolled. KMSClientProvider[https://<KMS_HOSTNAME>/kms/v1/] has been updated. $ curl -k --negotiate -u: "https://<KMS_HOSTNAME>:16000/kms/v1/key/<key-name>/_currentversion" { "material" : "<material>", (New material) "name" : "<key-name>", "versionName" : "<versionName>" (New version name. For example, version 2) } $ curl -k --negotiate -u: "https://<KMS_HOSTNAME>:16000/kms/v1/key/<key-name>/_metadata" { "name" : "<key-name>", "cipher" : "<cipher>", "length" : <length>, "description" : "<decription>", "created" : <millis-epoc>, "versions" : <versions> (For example, version 2) }
Re-encrypting Encrypted Data Encryption Keys (EDEKs)
Before attempting to re-encrypt an EDEK, familiarize yourself with the concepts described
in HDFS Transparent Encryption
, as the material in this section presumes you are
familiar with the fundamentals of HDFS transparent encryption and Cloudera data at rest
encryption.
When you re-encrypt an EDEK, you are essentially decrypting the original EDEK created by the DEK, and then re-encrypting it using the new (rolled) version of the EZ key (see Rolling Encryption Keys). The file's metadata, which is stored in the NameNode, is then updated with this new EDEK. Re-encryption does not impact the data in the HDFS files or the DEK–the same DEK is still used to decrypt the file, so re-encryption is essentially transparent.
Benefits and Capabilities
In addition to minimizing security risks, re-encrypting the EDEK offers the following capabilities and benefits:
- Re-encrypting EDEKs does not require that the user explicitly re-encrypt HDFS files.
- In cases where there are several zones using the same key, the Key Administrator has the option of selecting which zone’s EDEKs are re-encrypted first.
- The HDFS Superuser can also monitor and cancel re-encryption operations.
- Re-encryption is restarted automatically in cases where you have a NameNode failure during the re-encryption operation.
Prerequisites and Assumptions
- It is recommended that you perform EDEK re-encryption at the same time that you perform regular cluster maintenance because the operation can adversely impact CPU resources on the NameNode.
- In Cloudera Manager, review the cluster’s NameNode status, which must be in “Good Health”. If the cluster NameNode does not have a status of “Good Health”, then do not proceed with the re-encryption of the EDEK. In the Cloudera Manager WebUI menu, you can verify the status for the cluster NameNode, which must not be in Safe mode (in other words, the WebUI should indicate “Safemode is off”).
Running the re-encryption command without successfully verifying the preceding items will result in failures with errors.
Limitations
This section identifies limitations associated with the re-encryption of EDEKs.
EDEK re-encryption doesn't change EDEKs on snapshots, due to the immutable nature HDFS snapshots. Thus, you should be aware that after EZ key exposure, the Key Administrator must delete snapshots.
Re-encrypting an EDEK
This scenario operates on the assumption that an encryption zone has already been set up for this cluster.
- Navigate to the cluster in which you will be rolling keys and re-encrypting the EDEK.
- Log in as HDFS Superuser.
- (Optional) To view all of the options for the
hdfs crypto
command:$ hdfs crypto [-createZone –keyName <keyName> -path <path>] [-listZones] [-provisionTrash –path <path>] [-getFileEncryptionInfo –path <path>] [-reencryptZone <action> -path <zone>] [-listReencryptionStatus] [-help <command-name>]
- Before rolling any keys, verify/identify the encryption zones to which the current
key applies. This operation also helps clarify the relationship between the EZ key and
encryption zones, and, if necessary, makes it easier to identify more important, high
priority zones:
The first column identifies the encryption zone path ($ hdfs crypto –listZones /ez key1
/ez
); the second column identifies the encryption key name (key1
). - Exit from the HDFS Superuser account and log in as Key Administrator.
- Roll the encryption key (previously identified/confirmed by the HDFS Superuser in
step 4). Here, the
<key name>
is key1:
This operation contacts the KMS and rolls the keys. Note that this can take a considerable amount of time, depending on the number of key versions.hadoop key roll key1
Rolling key version from KeyProvider: org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider@5ea434c8 for keyName: key1 key1 has been successfully rolled. org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider@5ea434c8 has been updated.
- Log out as Key Administrator, and log in as HDFS Superuser.
- (Optional) Before performing the re-encryption,
you can verify the status of the current key version being used
(keyName). Then, after re-encrypting, you can confirm that the EZ
key version (
ezKeyVersionName
) and EDEK have changed:$ hdfs crypto –getFileEncryptionInfo –path /ez/f {cipherSuite: {name: AES/CTR/NoPadding, algorithmBlockSize: 16}. cryptoProtocolVersion: CryptoProtocolVersion{description=’Encryption zones’, version=2, unknownValue=null}, edek: 9aa13ea4a700f96287cfe1349f6ff4f2, iv: d129c913c8a34cde6371ec95edfb7337, keyName: key1, ezKeyVersionName: 7mbvopZ0Weuvs0XtTkpGw3G92KuWc4e4xcTXl0bXCpF}
- After the EZ key has been rolled successfully, re-encrypt the EDEK by running the
re-encryption command on the encryption
zone:
The following information appears when the submission is complete. At this point, the NameNode is processing and re-encrypting all of the EDEKs under thehdfs crypto –reencryptZone –start –path /ez
/ez
directory.
Depending on the number of files, the re-encryption operation can take a long time. Re-encrypting a 1Million EDEK file typically takes between 2-6 minutes, depending on the NameNode hardware. To check the status of the re-encryption for the zone:re-encrypt command successfully submitted for zone: /ez action: START:
hdfs crypto –listReencryptionStatus
Table 1. Re-encryption Status Column Data Column Name Description Sample Data ZoneName The encryption zone name /ez Status - Submitted: the command is received, but not yet being processed by the NameNode.
- Processing: the zone is being processed by the NameNode.
- Completed: the NameNode has finished processing the zone, and every file in the zone has been re-encrypted.
Completed EZKey Version Name The encryption zone key version name, which used for re-encryption comparison. After re-encryption is complete, all files in the encryption zone are guaranteed to have an EDEK whose encryption zone key version is at least equal to this version. ZMHfRoGKeXXgf0QzCX8q16NczIw2sq0rWRTOHS3YjCz Submission Time The time at which the re-encryption operation commenced. 2017-09-07 10:01:09,262-0700 Is Canceled? True: the encryption operation has been canceled. False: the encryption operation has not been canceled.
False Completion Time The time at which the re-encryption operation completed. 2017-09-07 10:01:10,441-0700 Number of files re-encrypted The number that appears in this column reflects only the files whose EDEKs have been updated. If a file is created after the key is rolled, then it will already have an EDEK that has been encrypted by the new key version, so the re-encryption operation will skip that file. In other words, it's possible for a "Completed" re-encryption to reflect a number of re-encrypted files that is less than the number of files actually in the encryption zone. Note: In cases when you re-encrypt an EZ key that has already been re-encrypted and there are no new files, the number of files re-encrypted will be 0.
1 Number of failures When 0, no errors occurred during the re-encryption operation. If larger than 0, then investigate the NameNode log, and re-encrypt. 0 Last file Checkpointed Identifies the current position of the re-encryption process in the encryption zone--in other words, the file that was most recently re-encrypted. 0 - (Optional) After the re-encryption completes, you
can confirm that the EDEK and EZ Key Version Name values have
changed:
$ hdfs crypto –getFileEncryptionInfo –path /ez/f {cipherSuite: {name: AES/CTR/NoPadding, algorithmBlockSize: 16}. cryptoProtocolVersion: CryptoProtocolVersion{description=’Encryption zones’, version=2, unknownValue=null}, edek: 373c0c2e919c27e58c1c343f54233cbd, iv: d129c913c8a34cde6371ec95edfb7337, keyName: key1, ezKeyVersionName: ZMHfRoGKeXXgf0QzCX8q16NczIw2sq0rWRTOHS3YjCz }
Managing Re-encryption Operations
This section includes information that can help you manage various facets of the EDEK re-encryption process.
Cancelling Re-encryption
Only users with the HDFS Superuser privilege can cancel the EDEK re-encryption after the operation has started.
To cancel a re-encryption:
hadoop crypto -reencryptZone cancel -path <zone>
Rolling Keys During a Re-encryption Operation
While it is not recommended, it is possible to roll the encryption zone key version on the KMS while a re-encryption of that encryption zone is already in progress in the NameNode. The re-encryption is guaranteed to complete with all DEKs re-encrypted, with a key version equal to or later than the encryption zone key version when the re-encryption command was submitted. This means that, if initially the key version is rolled from v0 to v1, then a re-encryption command was submitted. If later on the KMS the key version is rolled again to v2, then all EDEKs will be at least re-encrypted to v1. To ensure that all EDEKs are re-encrypted to v2, submit another re-encryption command for the encryption zone.
Rolling keys during re-encryption is not recommended because of the potential negative impact on key management operations. Due to the asynchronous nature of re-encryption, there is no guarantee of when, exactly, the rolled encryption keys will take effect. Re-encryption can only guarantee that all EDEKs are re-encrypted at least on the EZ key version that existed when the re-encryption command is issued.
Throttling Re-encryption Operations
With the default operation settings, you will not typically need to throttle re-encryption operations. However, in cases of excessive performance impact due to the re-encryption of large numbers of files, advanced users have the option of throttling the operation so that the impact on the HDFS NameNode and KT KMS are minimized.
- The number of EDEKs that the NameNode should send to the KMS to re-encrypt in a batch (dfs.namenode.reencrypt.batch.size)
- The number of threads in the NameNode that can run concurrently to contact the KMS. (dfs.namenode.reencrypt.edek.threads)
- Percentage of time the NameNode read-lock should be held by the re-encryption thread (dfs.namenode.reencrypt.throttle.limit.handler.ratio)
- Percentage of time the NameNode write-lock should be held by the re-encryption thread (dfs.namenode.reencrypt.throttle.limit.updater.ratio)
You can monitor the HDFS NameNode heap and CPU usage from Cloudera Manager.