Security
Also available as:
PDF
loading table of contents...

HDFS Encryption Overview

HDFS data at rest encryption implements end-to-end encryption of data read from and written to HDFS. End-to-end encryption means that data is encrypted and decrypted only by the client. HDFS does not have access to unencrypted data or keys.

HDFS encryption involves several elements:

  • Encryption key: A new level of permission-based access protection, in addition to standard HDFS permissions.

  • HDFS encryption zone: A special HDFS directory within which all data is encrypted upon write, and decrypted upon read.

    • Each encryption zone is associated with an encryption key that is specified when the zone is created.

    • Each file within an encryption zone has a unique encryption key, called the "data encryption key" (DEK).

    • HDFS does not have access to DEKs. HDFS DataNodes only see a stream of encrypted bytes. HDFS stores "encrypted data encryption keys" (EDEKs) as part of the file's metadata on the NameNode.

    • Clients decrypt an EDEK and use the associated DEK to encrypt and decrypt data during write and read operations.

  • Ranger Key Management Service (Ranger KMS): An open source key management service based on Hadoop’s KeyProvider API.

    For HDFS encryption, the Ranger KMS has three basic responsibilities:

    • Provide access to stored encryption zone keys.

    • Generate and manage encryption zone keys, and create encrypted data keys to be stored in Hadoop.

    • Audit all access events in Ranger KMS.

    Note: This chapter is intended for security administrators who are interested in configuring and using HDFS encryption. For more information about Ranger KMS, see the Ranger KMS Administration Guide.

Figure 6.1. HDFS Encryption Components


Role Separation

Access to the key encryption/decryption process is typically restricted to end users. This means that encrypted keys can be safely stored and handled by HDFS, because the HDFS admin user does not have access to them.

This role separation requires two types of HDFS administrator accounts:

  • HDFS service user: the system-level account associated with HDFS (hdfs by default).

  • HDFS admin user: an account in the hdfs supergroup, which is used by HDFS administrators to configure and manage HDFS.

[Important]Important

For clear segregation of duties, we recommend that you restrict use of the hdfs account to system/interprocess use. Do not provide its password to physical users. A (human) user who administers HDFS should only access HDFS through an admin user account created specifically for that purpose. For more information about creating an HDFS admin user, see Creating an HDFS Admin User.

Other services may require a separate admin account for clusters with HDFS encryption zones. For service-specific information, see Configuring HDP Services for HDFS Encryption.