HDFS Encryption Overview
HDFS data at rest encryption implements end-to-end encryption of data read from and written to HDFS. End-to-end encryption means that data is encrypted and decrypted only by the client. HDFS does not have access to unencrypted data or keys.
HDFS encryption involves several elements:
-
Encryption key: A new level of permission-based access protection, in addition to standard HDFS permissions.
-
HDFS encryption zone: A special HDFS directory within which all data is encrypted upon write, and decrypted upon read.
-
Each encryption zone is associated with an encryption key that is specified when the zone is created.
-
Each file within an encryption zone has a unique encryption key, called the "data encryption key" (DEK).
-
HDFS does not have access to DEKs. HDFS DataNodes only see a stream of encrypted bytes. HDFS stores "encrypted data encryption keys" (EDEKs) as part of the file's metadata on the NameNode.
-
Clients decrypt an EDEK and use the associated DEK to encrypt and decrypt data during write and read operations.
-
-
Ranger Key Management Service (Ranger KMS): An open source key management service based on Hadoop’s
KeyProvider
API.For HDFS encryption, the Ranger KMS has three basic responsibilities:
-
Provide access to stored encryption zone keys.
-
Generate and manage encryption zone keys, and create encrypted data keys to be stored in Hadoop.
-
Audit all access events in Ranger KMS.
-
Role Separation
Access to the key encryption/decryption process is typically restricted to end users. This means that encrypted keys can be safely stored and handled by HDFS, because the HDFS admin user does not have access to them.
This role separation requires two types of HDFS administrator accounts:
-
HDFS service user: the system-level account associated with HDFS (
hdfs
by default). -
HDFS admin user: an account in the
hdfs
supergroup, which is used by HDFS administrators to configure and manage HDFS.
Note | |
---|---|
For clear segregation of duties, we recommend that you restrict use of the
hdfs account to system/interprocess use. Do not provide its password to
physical users. A (human) user who administers HDFS should only access HDFS through an
admin user account created specifically for that purpose. For more information about
creating an HDFS admin user, see “Create an HDFS Admin User”. |
Other services may require a separate admin account for clusters with HDFS encryption zones. For service-specific information, see “Configuring HDP Services for HDFS Encryption”.