Managing Encryption Keys and Zones
Interacting with the KMS and creating encryption zones requires the use of two new CLI commands: hadoop key and hdfs crypto. The following sections will help you get started with creating encryption keys and setting up encryption zones.
Before continuing, make sure that your KMS ACLs have been set up according to best practices. For more information, see Configuring KMS Access Control Lists (ACLs).
Validating Hadoop Key Operations
sudo -u <key_admin> hadoop key create keytrustee_test hadoop key list
Creating Encryption Zones
Once a KMS has been set up and the NameNode and HDFS clients have been correctly configured, use the hadoop key and hdfs crypto command-line tools to create encryption keys and set up new encryption zones.
- Create an encryption key for your zone as keyadmin for the user/group (regardless of the application that will be using the encryption zone):
sudo -u hdfs hadoop key create <key_name>
- Create a new empty directory and make it an encryption zone using the key created above.
sudo -u hdfs hadoop fs -mkdir /encryption_zone sudo -u hdfs hdfs crypto -createZone -keyName <key_name> -path /encryption_zone
You can verify creation of the new encryption zone by running the -listZones command. You should see the encryption zone along with its key listed as follows:$ sudo -u hdfs hdfs crypto -listZones /encryption_zone <key_name>
For more information and recommendations on creating encryption zones for each CDH component, see Configuring CDH Services for HDFS Encryption.
Adding Files to an Encryption Zone
sudo -u hdfs hadoop distcp /user/dir /encryption_zone
DistCp Considerations
A common use case for DistCp is to replicate data between clusters for backup and disaster recovery purposes. This is typically performed by the cluster administrator, who is an HDFS superuser. To retain this workflow when using HDFS encryption, a new virtual path prefix has been introduced, /.reserved/raw/, that gives superusers direct access to the underlying block data in the filesystem. This allows superusers to distcp data without requiring access to encryption keys, and avoids the overhead of decrypting and re-encrypting data. It also means the source and destination data will be byte-for-byte identical, which would not have been true if the data was being re-encrypted with a new EDEK.
Copying data from encrypted locations
By default, distcp compares checksums provided by the filesystem to verify that data was successfully copied to the destination. When copying from an encrypted location, the file system checksums will not match because the underlying block data is different. This is true whether or not the destination location is encrypted or unencrypted.
In this case, you can specify the -skipcrccheck and -update flags to avoid verifying checksums. When you use -skipcrccheck, distcp checks the file integrity by performing a file size comparison, right after the copy completes for each file.
Deleting Encryption Zones
sudo -u hdfs hadoop fs -rm -r -skipTrash /encryption_zone
Backing Up Encryption Keys
If you are using the Java KeyStore KMS, make sure you regularly back up the Java KeyStore that stores the encryption keys. If you are using the Key Trustee KMS and Key Trustee Server, see Backing Up and Restoring Key Trustee Server and Clients for instructions on backing up Key Trustee Server and Key Trustee KMS.
Rolling Encryption Keys
Before attempting to roll an encryption key (also known as an encryption zone key, or EZ key), familiarize yourself with the concepts described in Cloudera Navigator Data Encryption Overview, and HDFS Transparent Encryption, as the material in these sections presumes you are familiar with the fundamentals of HDFS transparent encryption and Cloudera data at rest encryption.
When you roll an EZ key, you are essentially creating a new version of the key (ezKeyVersionName). Rolling EZ keys regularly helps enterprises minimize the risk of key exposure. If a malicious attacker were to obtain the EZ key and decrypt encrypted data encryption keys (EDEKs) into DEKs, they could gain the ability to decrypt HDFS files. Rolling an EZ key ensures that all DEKs for newly-created files will be encrypted with the new version of the EZ key. The older EZ key version that the attacker obtained cannot decrypt these EDEKs. You may want to roll the encryption key periodically, as part of your security policy or when an external security compromise is detected.
- (Optional) Before rolling any keys, log in as HDFS Superuser and verify/identify the encryption zones to which the current key
applies. This operation also helps clarify the relationship between the EZ key and encryption zones, and, if necessary, makes it easier to identify more important, high priority zones:
$ hdfs crypto –listZones /ez key1 /ez2 key2 /user key1
The first column identifies the encryption zone paths; the second column identifies the encryption key name. - (Optional) You can verify that the files inside an encryption zone are encrypted using the hdfs crypto
-getFileEncryptionInfo command. Note the EZ key version name and value, which you can use for comparison and verification after rolling the EZ key.
$ hdfs crypto –getFileEncryptionInfo –path /ez/f {cipherSuite: {name: AES/CTR/NoPadding, algorithmBlockSize: 16}. cryptoProtocolVersion: CryptoProtocolVersion{description=’Encryption zones’, version=2, unknownValue=null}, edek: 373c0c2e919c27e58c1c343f54233cbd, iv: d129c913c8a34cde6371ec95edfb7337, keyName: key1, ezKeyVersionName: 7mbvopZ0Weuvs0XtTkpGw3G92KuWc4e4xcTXl0bXCpF}
Log off as HDFS Superuser. - Log in as Key Administrator. Because keys can be rolled, a key can have multiple key versions, where each key version has its own key material (the actual secret bytes used during DEK
encryption and EDEK decryption). You can fetch an encryption key by either its key name, returning the latest version of the key, or by a specific key version.
Roll the encryption key (previously identified/confirmed by the HDFS Superuser in step 1. Here, the <key name> is key1:
hadoop key roll key1
This operation contacts the KMS and rolls the keys there. Note that this can take a considerable amount of time, depending on the number of key versions residing in the KMS.Rolling key version from KeyProvider: org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider@5ea434c8 for keyName: key1 key1 has been successfully rolled. org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider@5ea434c8 has been updated.
- (Optional) Log out as Key Administrator, and log in as HDFS Superuser. Verify that new files in the encryption zone have a new EZ
key version.
$ hdfs crypto –getFileEncryptionInfo –path /ez/new_file {cipherSuite: {name: AES/CTR/NoPadding, algorithmBlockSize: 16}. cryptoProtocolVersion: CryptoProtocolVersion{description=’Encryption zones’, version=2, unknownValue=null}, edek: 9aa13ea4a700f96287cfe1349f6ff4f2, iv: 465c878ad9325e42fa460d2a22d12a72, keyName: key1, ezKeyVersionName: 4tuvorJ6Feeqk8WiCfdDs9K32KuEj7g2ydCAv0gNQbY}
Alternatively, you can use KMS Rest API to view key metadata and key versions. Elements appearing in brackets should be replaced with your actual values. So in this case, before rolling a key, you can view the key metadata and versions as follows:$ curl -k --negotiate -u: "https://<KMS_HOSTNAME>:16000/kms/v1/key/<key-name>/_metadata" { "name" : "<key-name>", "cipher" : "<cipher>", "length" : <length>, "description" : "<decription>", "created" : <millis-epoc>, "versions" : <versions> (For example, 1) } $ curl -k --negotiate -u: "https://<KMS_HOSTNAME>:16000/kms/v1/key/<key-name>/_currentversion" { "material" : "<material>", "name" : "<key-name>", "versionName" : "<versionName>" (For example, version 1) }
$ hadoop key roll key1 Rolling key version from KeyProvider: KMSClientProvider[https://<KMS_HOSTNAME>:16000/kms/v1/] for key name: <key-name> key1 has been successfully rolled. KMSClientProvider[https://<KMS_HOSTNAME>/kms/v1/] has been updated. $ curl -k --negotiate -u: "https://<KMS_HOSTNAME>:16000/kms/v1/key/<key-name>/_currentversion" { "material" : "<material>", (New material) "name" : "<key-name>", "versionName" : "<versionName>" (New version name. For example, version 2) } $ curl -k --negotiate -u: "https://<KMS_HOSTNAME>:16000/kms/v1/key/<key-name>/_metadata" { "name" : "<key-name>", "cipher" : "<cipher>", "length" : <length>, "description" : "<decription>", "created" : <millis-epoc>, "versions" : <versions> (For example, version 2) }
Re-encrypting Encrypted Data Encryption Keys (EDEKs)
Before attempting to re-encrypt an EDEK, familiarize yourself with the concepts described in Cloudera Navigator Data Encryption Overview, and HDFS Transparent Encryption, as the material in this section presumes you are familiar with the fundamentals of HDFS transparent encryption and Cloudera data at rest encryption.
When you re-encrypt an EDEK, you are essentially decrypting the original EDEK created by the DEK, and then re-encrypting it using the new (rolled) version of the EZ key (see Rolling Encryption Keys). The file's metadata, which is stored in the NameNode, is then updated with this new EDEK. Re-encryption does not impact the data in the HDFS files or the DEK–the same DEK is still used to decrypt the file, so re-encryption is essentially transparent.
Benefits and Capabilities
In addition to minimizing security risks, re-encrypting the EDEK offers the following capabilities and benefits:
- Re-encrypting EDEKs does not require that the user explicitly re-encrypt HDFS files.
- In cases where there are several zones using the same key, the Key Administrator has the option of selecting which zone’s EDEKs are re-encrypted first.
- The HDFS Superuser can also monitor and cancel re-encryption operations.
- Re-encryption is restarted automatically in cases where you have a NameNode failure during the re-encryption operation.
Prerequisites and Assumptions
Before attempting to re-encrypt an EDEK, familiarize yourself with the concepts and rules described in Managing Encryption Keys and Zones.
- It is recommended that you perform EDEK re-encryption at the same time that you perform regular cluster maintenance because the operation can adversely impact CPU resources on the NameNode.
- In Cloudera Manager, review the cluster’s NameNode status, which must be in “Good Health”. If the cluster NameNode does not have a status of “Good Health”, then do not proceed with the re-encryption of the EDEK. In the Cloudera Manager WebUI menu, you can verify the status for the cluster NameNode, which must not be in Safe mode (in other words, the WebUI should indicate “Safemode is off”).
Running the re-encryption command without successfully verifying the preceding items will result in failures with errors.
Limitations
This section identifies limitations associated with the re-encryption of EDEKs.
EDEK re-encryption doesn't change EDEKs on snapshots, due to the immutable nature HDFS snapshots. Thus, you should be aware that after EZ key exposure, the Key Administrator must delete snapshots.
Re-encrypting an EDEK
This scenario operates on the assumption that an encryption zone has already been set up for this cluster. For more details about creating an encryption zone, see Creating Encryption Zones.
- Navigate to the cluster in which you will be rolling keys and re-encrypting the EDEK.
- Log in as HDFS Superuser.
- (Optional) To view all of the options for the hdfs crypto command:
$ hdfs crypto [-createZone –keyName <keyName> -path <path>] [-listZones] [-provisionTrash –path <path>] [-getFileEncryptionInfo –path <path>] [-reencryptZone <action> -path <zone>] [-listReencryptionStatus] [-help <command-name>]
- Before rolling any keys, verify/identify the encryption zones to which the current key applies. This operation also helps clarify the relationship between the EZ key and encryption
zones, and, if necessary, makes it easier to identify more important, high priority zones:
$ hdfs crypto –listZones /ez key1
The first column identifies the encryption zone path (/ez); the second column identifies the encryption key name (key1). - Exit from the HDFS Superuser account and log in as Key Administrator.
- Roll the encryption key (previously identified/confirmed by the HDFS Superuser in step 4). Here, the <key name> is key1:
hadoop key roll key1
This operation contacts the KMS and rolls the keys. Note that this can take a considerable amount of time, depending on the number of key versions.Rolling key version from KeyProvider: org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider@5ea434c8 for keyName: key1 key1 has been successfully rolled. org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider@5ea434c8 has been updated.
- Log out as Key Administrator, and log in as HDFS Superuser.
- (Optional) Before performing the re-encryption, you can verify the status of the current key version being used (keyName). Then,
after re-encrypting, you can confirm that the EZ key version (ezKeyVersionName) and EDEK have changed:
$ hdfs crypto –getFileEncryptionInfo –path /ez/f {cipherSuite: {name: AES/CTR/NoPadding, algorithmBlockSize: 16}. cryptoProtocolVersion: CryptoProtocolVersion{description=’Encryption zones’, version=2, unknownValue=null}, edek: 9aa13ea4a700f96287cfe1349f6ff4f2, iv: d129c913c8a34cde6371ec95edfb7337, keyName: key1, ezKeyVersionName: 7mbvopZ0Weuvs0XtTkpGw3G92KuWc4e4xcTXl0bXCpF}
- After the EZ key has been rolled successfully, re-encrypt the EDEK by running the re-encryption command on the encryption zone:
hdfs crypto –reencryptZone –start –path /ez
The following information appears when the submission is complete. At this point, the NameNode is processing and re-encrypting all of the EDEKs under the /ez directory.re-encrypt command successfully submitted for zone: /ez action: START:
Depending on the number of files, the re-encryption operation can take a long time. Re-encrypting a 1M EDEK file typically takes between 2-6 minutes, depending on the NameNode hardware. To check the status of the re-encryption for the zone:hdfs crypto –listReencryptionStatus
Re-encryption Status Column Data Column Name Description Sample Data ZoneName The encryption zone name /ez Status - Submitted: the command is received, but not yet being processed by the NameNode.
- Processing: the zone is being processed by the NameNode.
- Completed: the NameNode has finished processing the zone, and every file in the zone has been re-encrypted.
Completed EZKey Version Name The encryption zone key version name, which used for re-encryption comparison. After re-encryption is complete, all files in the encryption zone are guaranteed to have an EDEK whose encryption zone key version is at least equal to this version. ZMHfRoGKeXXgf0QzCX8q16NczIw2sq0rWRTOHS3YjCz Submission Time The time at which the re-encryption operation commenced. 2017-09-07 10:01:09,262-0700 Is Canceled? True: the encryption operation has been canceled. False: the encryption operation has not been canceled.
False Completion Time The time at which the re-encryption operation completed. 2017-09-07 10:01:10,441-0700 Number of files re-encrypted The number that appears in this column reflects only the files whose EDEKs have been updated. If a file is created after the key is rolled, then it will already have an EDEK that has been encrypted by the new key version, so the re-encryption operation will skip that file. In other words, it's possible for a "Completed" re-encryption to reflect a number of re-encrypted files that is less than the number of files actually in the encryption zone. Note: In cases when you re-encrypt an EZ key that has already been re-encrypted and there are no new files, the number of files re-encrypted will be 0.
1 Number of failures When 0, no errors occurred during the re-encryption operation. If larger than 0, then investigate the NameNode log, and re-encrypt. 0 Last file Checkpointed Identifies the current position of the re-encryption process in the encryption zone--in other words, the file that was most recently re-encrypted. 0 - (Optional) After the re-encryption completes, you can confirm that the EDEK and EZ Key Version Name values have changed:
$ hdfs crypto –getFileEncryptionInfo –path /ez/f {cipherSuite: {name: AES/CTR/NoPadding, algorithmBlockSize: 16}. cryptoProtocolVersion: CryptoProtocolVersion{description=’Encryption zones’, version=2, unknownValue=null}, edek: 373c0c2e919c27e58c1c343f54233cbd, iv: d129c913c8a34cde6371ec95edfb7337, keyName: key1, ezKeyVersionName: ZMHfRoGKeXXgf0QzCX8q16NczIw2sq0rWRTOHS3YjCz }
Managing Re-encryption Operations
This section includes information that can help you manage various facets of the EDEK re-encryption process.
Cancelling Re-encryption
Only users with the HDFS Superuser privilege can cancel the EDEK re-encryption after the operation has started.
To cancel a re-encryption:
hadoop crypto -reencryptZone cancel -path <zone>
Rolling Keys During a Re-encryption Operation
While it is not recommended, it is possible to roll the encryption zone key version on the KMS while a re-encryption of that encryption zone is already in progress in the NameNode. The re-encryption is guaranteed to complete with all DEKs re-encrypted, with a key version equal to or later than the encryption zone key version when the re-encryption command was submitted. This means that, if initially the key version is rolled from v0 to v1, then a re-encryption command was submitted. If later on the KMS the key version is rolled again to v2, then all EDEKs will be at least re-encrypted to v1. To ensure that all EDEKs are re-encrypted to v2, submit another re-encryption command for the encryption zone.
Rolling keys during re-encryption is not recommended because of the potential negative impact on key management operations. Due to the asynchronous nature of re-encryption, there is no guarantee of when, exactly, the rolled encryption keys will take effect. Re-encryption can only guarantee that all EDEKs are re-encrypted at least on the EZ key version that existed when the re-encryption command is issued.
Throttling Re-encryption Operations
With the default operation settings, you will not typically need to throttle re-encryption operations. However, in cases of excessive performance impact due to the re-encryption of large numbers of files, advanced users have the option of throttling the operation so that the impact on the HDFS NameNode and KT KMS are minimized.
- The number of EDEKs that the NameNode should send to the KMS to re-encrypt in a batch (dfs.namenode.reencrypt.batch.size)
- The number of threads in the NameNode that can run concurrently to contact the KMS. (dfs.namenode.reencrypt.edek.threads)
- Percentage of time the NameNode read-lock should be held by the re-encryption thread (dfs.namenode.reencrypt.throttle.limit.handler.ratio)
- Percentage of time the NameNode write-lock should be held by the re-encryption thread (dfs.namenode.reencrypt.throttle.limit.updater.ratio)
You can monitor the HDFS NameNode heap and CPU usage from Cloudera Manager.