Encryption

An overview of encryption in Cloudera on premises.

Encryption in transit is enabled using TLS security with two modes of deployment: manual or Auto TLS. For manual TLS, customers use their own scripts to generate and deploy their own certificates to the cluster hosts and the locations used are then configured in Cloudera Manager to enable their use by the cluster services.

Typically, security artifacts such as certificates will be stored on the local filesystem /opt/cloudera/security. The cloudera-deploy Ansible automation (see, Cloudera Labs: Cloudera Deploy) uses this method.

Auto TLS enables Cloudera Manager to act as a certificate authority, standalone or delegated by an existing corporate authority. Cloudera Manager can then generate, sign and then deploy the signed certificates around the cluster along with any associated truststores. Once in place, the certificates are then used to encrypt network traffic between the cluster services. There are three primary communication channels, HDFS Transparent Encryption, Data Transfer and Remote Procedure Calls, and communications with the various user Interfaces and APIs.

See, Encryption Overview.

Encryption at Rest

In order to store sensitive data securely it is vital to ensure that data is encrypted at rest whilst also being available for processing by appropriately privileged users and services that are granted the ability to decrypt. A secure Cloudera cluster will feature full transparent HDFS Encryption, often with separate encryption zones for the various storage tenants and use cases. We caution against encrypting the entire HDFS, rather those directory hierarchies that require encryption.

As well as HDFS other key local storage locations such as YARN and Impala scratch directories, log files can be similarly encrypted using block encryption. Both HDFS and the local filesystem can be integrated to a secure Key Trustee Service, typically deployed into a separate cluster that assumes responsibility for key management. This ensures separation of duties from the cluster administrators and the security administrators that are responsible for the encryption keys. Furthermore Key Trustee client side encryption provides defence in depth for vulnerabilities such as Apache Log4j2 (see, CVE-2021-44228).

Logical Architecture

Ranger KMS Service Overview

The Ranger KMS service consolidates the Ranger KMS service and Key Trustee KMS into a single service that can both support the Ranger KMS DB or the Key Trustee Service as the backing key store for the service. Both implementations support HMS integration with KTS backing support providing full separation of duties and a broader range of HSMs, albeit at the cost of additional complexity and hardware. It is preferable to support RKMS + KTS in order to simplify operations and run at scale. Ranger KMS supports:

Key Management provides the ability to create, update or delete keys using the Web UI or REST APIs
Access control provides the ability to manage access control policies within Ranger KMS. The access policies control permissions to generate or manage keys, adding another layer of security for data encrypted in HDFS.
AuditRanger provides a full audit trace of all actions performed by Ranger KMS.
All policies are maintained by the Ranger service.