Encryption logical architecture

Integrity of the data is a crucial pillar to meet the requirements of discerning investors that are looking to hold organizations to the highest ethical, social and governance (ESG) standards. Cloudera Data Platform delivers robust encryption at rest, using KeyTrustee Server to manage key material with the option of integrating with an HSM (Hardware Security Module) to act as your Key Backing store.

This is advantageous as it facilitates integration with pre-existing security processes for maintaining cryptographic material and separation of duties with the security department responsible for maintaining the HSM. In the context of our climate accounting application this ensures that the organization's existing regulatory framework is applied and the potential to achieve equivalent standards or reporting as financing. The diagram below illustrates how this is implemented logically.



The architecture supports the encryption of both HDFS and the local filesystem. Local filesystem directories might include Impala and Yarn scratch directories which are used as working directories when a job might spill to disk due to the task exceeding the memory capacity of worker node, audit directories containing user IDs, and in some cases log directories that contain clauses with PII data depending upon existing enterprise security policies.

We generally recommend that customers avoid encrypting the entire HDFS namespace as this will lead to operational challenges and create encryption zones within the hdfs directory structure. Customers will often create separate cryptographic zones for cluster tenants, however there will often be cases where multiple tenants may wish to share the same datasets and it is more practical to create a single cryptographic zone and utilize Apache Ranger authorisation policies to provide the fine grained access controls. Consideration should be given more broadly to the underlying regulatory environment in arriving at a zoning policy. So for example all data that is classified as Commercially Sensitive might be encrypted under a single zone. Data classified as Secret in another. An illustration of this might be any additional columns that contain commercially sensitive financial information.

Encryption at rest installation

For full installation details using the Cloudera Manager wizard, see Setting Up Data at Rest Encryption for HDFS.

Identity Mapping

Much of this blog post focuses upon authorizing groups of users to perform administrative and analytical tasks against different datasets. In order to provide effective security the cluster must first authenticate both users and applications typically using kerberos or in some cases LDAP or SAML. Once an identity has been established that user in question will then be authorized in order to perform some action. In order to do this, the cluster will use and identify mapping service such as SSSD, Centrify or Powerbroker which will integrate with the corporate directory such as Microsoft Active Directory or Red Identity Manager which provides established tools and processes for mapping users to groups which are then used for authorisations.

A new feature introduced in CDP are Ranger roles, previously Ranger only had the concept of groups. Administrators can create roles and then allocate permissions to those roles to perform an action such as to read a table in a database. These roles can then be linked to groups that are maintained in the corporate directory. The advantage is that typically these groups pre-exist and can be easily linked to the new ranger roles which in turn can be amended whilst the linked group remains unchanged. The role itself is a collection of policies whereas the group is the collection of users.