Configuring and Using HDFS Data at Rest Encryption
After the Ranger KMS has been set up and the NameNode and HDFS clients have been
configured, an HDFS administrator can use the hadoop key
and hdfs
crypto
command-line tools to create encryption keys and set up new encryption
zones.
The overall workflow is as follows:
Create an HDFS encryption zone key that will be used to encrypt the file-level data encryption key for every file in the encryption zone. This key is stored and managed by Ranger KMS.
Create a new HDFS folder. Specify required permissions, owner, and group for the folder.
Using the new encryption zone key, designate the folder as an encryption zone.
Configure client access. The user associated with the client application needs sufficient permission to access encrypted data. In an encryption zone, the user needs file/directory access (through Posix permissions or Ranger access control), as well as access for certain key operations. To set up ACLs for key-related operations, see the Ranger KMS Administration Guide.
After permissions are set, Java API clients and HDFS applications with sufficient HDFS and Ranger KMS access privileges can write and read to/from files in the encryption zone.
Prepare the Environment
HDP supports hardware acceleration with Advanced Encryption Standard New Instructions (AES-NI). Compared with the software implementation of AES, hardware acceleration offers an order of magnitude faster encryption/decryption.
To use AES-NI optimization you need CPU and library support, described in the following subsections.
CPU Support for AES-NI optimization
AES-NI optimization requires an extended CPU instruction set for AES hardware acceleration.
There are several ways to check for this; for example:
$ cat /proc/cpuinfo | grep aes
Look for output with flags and 'aes'.
Library Support for AES-NI optimization
You will need a version of the libcrypto.so
library that supports
hardware acceleration, such as OpenSSL 1.0.1e. (Many OS versions have an older
version of the library that does not support AES-NI.)
A version of the libcrypto.so
libary with AES-NI support must be
installed on HDFS cluster nodes and MapReduce client hosts -- that is, any host
from which you issue HDFS or MapReduce requests. The following instructions
describe how to install and configure the libcrypto.so
library.
RHEL/CentOS 6.5 or later
On HDP cluster nodes, the installed version of libcrypto.so
supports
AES-NI, but you will need to make sure that the symbolic link exists:
$ sudo ln -s /usr/lib64/libcrypto.so.1.0.1e
/usr/lib64/libcrypto.so
On MapReduce client hosts, install the openssl-devel
package:
$ sudo yum install openssl-devel
Verifying AES-NI Support
To verify that a client host is ready to use the AES-NI instruction set optimization for HDFS encryption, use the following command:
hadoop checknative
You should see a response similar to the following:
15/08/12 13:48:39 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native 14/12/12 13:48:39 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library Native library checking: hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 zlib: true /lib64/libz.so.1 snappy: true /usr/lib64/libsnappy.so.1 lz4: true revision:99 bzip2: true /lib64/libbz2.so.1 openssl: true /usr/lib64/libcrypto.so
If you see true
in the openssl
row, Hadoop has detected
the right version of libcrypto.so
and optimization will work.
If you see false
in this row, you do not have the correct version.
Create an Encryption Key
Create a "master" encryption key for the new encryption zone. Each key will be specific to an encryption zone.
Ranger supports AES/CTR/NoPadding as the cipher suite. (The associated property is listed under HDFS -> Configs in the Advanced hdfs-site list.)
Key size can be 128 or 256 bits.
Recommendation: create a new superuser for key
management. In the following examples, superuser encr
creates the key.
This separates the data access role from the encryption role, strengthening
security.
Create an Encryption Key using Ranger KMS (Recommended)
In the Ranger Web UI screen:
Choose the Encryption tab at the top of the screen.
Select the KMS service from the drop-down list.
To create a new key:
Click on "Add New Key":
Add a valid key name.
Select the cipher name. Ranger supports AES/CTR/NoPadding as the cipher suite.
Specify the key length, 128 or 256 bits.
Add other attributes as needed, and then save the key.
For information about rolling over and deleting keys, see Using the Ranger Key Management Service in the Ranger KMS Administration Guide.
Warning | |
---|---|
Do not delete an encryption key while it is in use for an encryption zone. This will result in loss of access to data in that zone. |
Create an Encryption Key using the CLI
The full syntax of the hadoop key create
command is as follows:
[create <keyname> [-cipher <cipher>] [-size <size>] [-description <description>] [-attr <attribute=value>] [-provider <provider>] [-help]]
Example:
# su - encr
# hadoop key create <key_name> [-size <number-of-bits>]
The default key size is 128 bits. The optional -size
parameter
supports 256-bit keys, and requires the Java Cryptography Extension (JCE) Unlimited
Strength Jurisdiction Policy File on all hosts in the cluster. For installation
information, see the Ambari Security Guide.
Example:
# su - encr
# hadoop key create key1
To verify creation of the key, list the metadata associated with the current user:
# hadoop key list -metadata
For information about rolling over and deleting keys, see Using the Ranger Key Management Service in the Ranger KMS Administration Guide.
Warning | |
---|---|
Do not delete an encryption key while it is in use for an encryption zone. This will result in loss of access to data in that zone. |
Create an Encryption Zone
Each encryption zone must be defined using an empty directory and an existing encryption key. An encryption zone cannot be created on top of a directory that already contains data.
Recommendation: use one unique key for each encryption zone.
Use the crypto
createZone
command to create a new encryption zone. The syntax
is:
-createZone -keyName <keyName> -path <path>
where:
-keyName
: specifies the name of the key to use for the encryption zone.-path
specifies the path of the encryption zone to be created. It must be an empty directory.
Note | |
---|---|
The Recommendation: Define a separate user account for the HDFS administrator, and do not provide access to keys for this user in Ranger KMS. |
Steps:
As HDFS administrator, create a new empty directory. For example:
# hdfs dfs -mkdir /zone_encr
Using the encryption key, make the directory an encryption zone. For example:
# hdfs crypto -createZone -keyName key1 -path /zone_encr
When finished, the NameNode will recognize the folder as an HDFS encryption zone.
To verify creation of the new encryption zone, run the
crypto -listZones
command as an HDFS administrator:-listZones
You should see the encryption zone and its key. For example:
$ hdfs crypto -listZones /zone-encr key1
Note The following property (in the
hdfs-default.xml
file) causes listZone requests to be batched. This improves NameNode performance. The property specifies the maximum number of zones that will be returned in a batch.dfs.namenode.list.encryption.zones.num.responses
The default is 100.
To remove an encryption zone, delete the root directory of the zone. For example:
hdfs dfs -rm -R /zone_encr
Copy Files from/to an Encryption Zone
To copy existing files into an encryption zone, use a tool like distcp
.
Note: for separation of administrative roles,
do not use the hdfs
user to create encryption zones. Instead, designate
another administrative account for creating encryption keys and zones. See
Creating an HDFS Admin User for more information.
The files will be encrypted using a file-level key generated by the Ranger Key Management Service.
DistCp Considerations
DistCp
is commonly used to replicate data between clusters for backup and
disaster recovery purposes. This operation is typically performed by the cluster
administrator, via an HDFS superuser account.
To retain this workflow when using HDFS encryption, a new virtual path prefix has
been introduced, /.reserved/raw/
. This virtual path gives super users
direct access to the underlying encrypted block data in the file system, allowing
super users to distcp
data without requiring access to encryption keys.
This also avoids the overhead of decrypting and re-encrypting data. The source and
destination data will be byte-for-byte identical, which would not be true if the
data were re-encrypted with a new EDEK.
Warning | |
---|---|
When using
This means that if the Recommendation: To avoid potential mishaps, first create identical encryption zones on the destination cluster. |
Copying between encrypted and unencrypted locations
By default, distcp
compares file system checksums to verify that data was
successfully copied to the destination.
When copying between an unencrypted and encrypted location, file system checksums
will not match because the underlying block data is different. In this case, specify
the -skipcrccheck
and -update
flags to avoid verifying
checksums.
Read and Write Files from/to an Encryption Zone
Clients and HDFS applications with sufficient HDFS and Ranger KMS permissions can read and write files from/to an encryption zone.
Overview of the client write process:
The client writes to the encryption zone.
The NameNode checks to make sure that the client has sufficient write access permissions. If so, the NameNode asks Ranger KMS to create a file-level key, encrypted with the encryption zone master key.
The Namenode stores the file-level encrypted data encryption key (EDEK) generated by Ranger KMS as part of the file's metadata, and returns the EDEK to the client.
The client asks Ranger KMS to decode the EDEK (to DEK), and uses the DEK to write encrypted data. Ranger KMS checks for permissions for the user before decrypting EDEK and producing the DEK for the client.
Overview of the client read process:
The client issues a read request for a file in an encryption zone.
The NameNode checks to make sure that the client has sufficient read access permissions. If so, the NameNode returns the file's EDEK and the encryption zone key version that was used to encrypt the EDEK.
The client asks Ranger KMS to decrypt the EDEK. Ranger KMS checks for permissions to decrypt EDEK for the end user.
Ranger KMS decrypts and returns the (unencrypted) data encryption key (DEK).
The client uses the DEK to decrypt and read the file.
The preceding steps take place through internal interactions between the DFSClient, the NameNode, and Ranger KMS.
In the following example, the /zone_encr
directory is an encrypted zone
in HDFS.
To verify this, use the crypto -listZones
command (as an HDFS
administrator). This command lists the root path and the zone key for the encryption
zone. For example:
# hdfs crypto -listZones /zone_encr key1
Additionally, the /zone_encr
directory has been set up for read/write
access by the hive
user:
# hdfs dfs -ls / … drwxr-x--- - hive hive 0 2015-01-11 23:12 /zone_encr
The hive
user can, therefore, write data to the directory.
The following examples use the copyFromLocal
command to move a local
file into HDFS.
[hive@blue ~]# hdfs dfs -copyFromLocal web.log /zone_encr [hive@blue ~]# hdfs dfs -ls /zone_encr Found 1 items -rw-r--r-- 1 hive hive 1310 2015-01-11 23:28 /zone_encr/web.log
The hive
user can read data from the directory, and can verify that the
file loaded into HDFS is readable in its unencrypted form.
[hive@blue ~]# hdfs dfs -copyToLocal /zone_encr/web.log read.log [hive@blue ~]# diff web.log read.log
Note | |
---|---|
For more information about accessing encrypted files from Hive and other components, see Configuring HDP Services for HDFS Encryption. |
Users without access to KMS keys will be able to see file names (via the -ls command),
but they will not be able to write data or read from the encrypted zone. For
example, the hdfs
user lacks sufficient permissions, and cannot access
the data in /zone_encr
:
[hdfs@blue ~]# hdfs dfs -copyFromLocal install.log /zone_encr copyFromLocal: Permission denied: user=hdfs, access=EXECUTE, inode="/zone_encr":hive:hive:drwxr-x--- [hdfs@blue ~]# hdfs dfs -copyToLocal /zone_encr/web.log read.log copyToLocal: Permission denied: user=hdfs, access=EXECUTE, inode="/zone_encr":hive:hive:drwxr-x---
Delete Files from an Encryption Zone
You cannot move data from an Encryption Zone to a global Trash bin outside of the encryption zone.
To delete files from an encryption zone, use one of the following approaches:
When deleting the file via CLI, use the
-skipTrash
option. For example:hdfs dfs -rm /zone_name/file1 -skipTrash
When deleting the file via CLI, use the
-skipTrash
option. For example:hdfs dfs -rm /zone_name/file1 -skipTrash
(Hive only) Use PURGE, as in
DROP TABLE ... PURGE
. This skips the Trash bin even if the trash feature is enabled.