HDFS Encryption Overview
HDFS data at rest encryption implements end-to-end encryption of data read from and written to HDFS. End-to-end encryption means that data is encrypted and decrypted only by the client. HDFS does not have access to unencrypted data or keys.
HDFS encryption involves several elements:
Encryption key: A new level of permission-based access protection, in addition to standard HDFS permissions.
HDFS encryption zone: A special HDFS directory within which all data is encrypted upon write, and decrypted upon read.
Each encryption zone is associated with an encryption key that is specified when the zone is created.
Each file within an encryption zone has a unique encryption key, called the "data encryption key" (DEK).
HDFS does not have access to DEKs. HDFS DataNodes only see a stream of encrypted bytes. HDFS stores "encrypted data encryption keys" (EDEKs) as part of the file's metadata on the NameNode.
Clients decrypt an EDEK and use the associated DEK to encrypt and decrypt data during write and read operations.
Ranger Key Management Service (Ranger KMS): An open source key management service based on Hadoop’s
KeyProvider
API.For HDFS encryption, the Ranger KMS has three basic responsibilities:
Provide access to stored encryption zone keys.
Generate and manage encryption zone keys, and create encrypted data keys to be stored in Hadoop.
Audit all access events in Ranger KMS.
Note: This chapter is intended for security administrators who are interested in configuring and using HDFS encryption. For more information about Ranger KMS, see the Ranger KMS Administration Guide.
Role Separation
Access to the key encryption/decryption process is typically restricted to end users. This means that encrypted keys can be safely stored and handled by HDFS, because the HDFS admin user does not have access to them.
This role separation requires two types of HDFS administrator accounts:
HDFS service user: the system-level account associated with HDFS (
hdfs
by default).HDFS admin user: an account in the
hdfs
supergroup, which is used by HDFS administrators to configure and manage HDFS.
Important | |
---|---|
For clear segregation of duties, we recommend that you restrict use of the |
Other services may require a separate admin account for clusters with HDFS encryption zones. For service-specific information, see Configuring HDP Services for HDFS Encryption.