Executive summary

This document provides a detailed security reference architecture for Cloudera Base on premises and is accompanied by a series of blog posts. The architecture reflects the four pillars of security engineering best practice, Perimeter, Data, Access and Visibility.

The release of Cloudera Base on premises has seen a number of significant enhancements to the security architecture including:

Apache Ranger for security policy management
Updated Ranger Key Management service

This document is intended as a reference guide only and is not a runbook or configuration manual. Cloudera recommends that you undertake your own security design activity in conjunction with Cloudera Professional Services, and where appropriate conduct third party assurances.

Note also that effective security is not just a combination of physical controls and system architecture but also organizational best practices around password/credential management, joiners-movers-leavers processes, group and privilege management, audit and compliance, separation of duties and physical access controls.

Before diving into the technologies it is worth becoming familiar with the key security principle of a layered approach that facilitates defense in depth. Each layer is defined as follows:

These multiple layers of security are applied in order to ensure the confidentiality, integrity and availability of data to meet the most robust of regulatory requirements. Cloudera Base on premises offers 3 levels of security that implement these features


Level	Security	Characteristics
0	Non-secure	No security configured. Non-secure clusters should never be used in production environments because they are vulnerable to any and all attacks and exploits.
1	Minimal	Configured for authentication, authorization, and auditing. Authentication is first configured to ensure that users and services can access the cluster only after proving their identities. Next, authorization mechanisms are applied to assign privileges to users and user groups. Auditing procedures keep track of who accesses the cluster (and how).
2	More	Sensitive data is encrypted. Key management systems handle encryption keys. Auditing has been setup for data in the metastore. System metadata is reviewed and updated regularly. Ideally, the cluster has been setup so that lineage for any data object can be traced (data governance).
3	Most	The secure cluster is one in which all data, both data-at-rest and data-in-transit, is encrypted and the key management system is fault-tolerant. Auditing mechanisms comply with industry, government, and regulatory standards (PCI, HIPAA, NIST, for example), and extend from the Cluster to the other systems that integrate with it. Cluster administrators are well-trained, security procedures have been certified by an expert, and the cluster can pass technical review.

For the purposes of this document we are going to focus on the most secure level 3 security.