Data Lake Security

A Data Lake refers to the shared security and governance services in a CDP Private Cloud Base cluster linked to a CDP Private Cloud environment, and managed by Cloudera Manager. This set of shared services is also referred to as SDX (Shared Data eXperience).

CDP Private Cloud security and governance are managed by Apache Ranger and Apache Atlas.

Data Lake services

Data Lake services are managed by Cloudera Manager, and can include the following services:

  • Hive MetaStore (HMS) -- table metadata

  • Apache Ranger -- fine-grained authorization policies, auditing

  • Apache Atlas -- metadata management and governance: lineage, analytics, attributes

Security in all workload clusters created in an environment is managed by these shared security and governance services.

Links to the Atlas and Ranger web UIs are provided on each environment home page. A link to the Cloudera Manager instance provides access to configuration settings.

Apache Ranger

Apache Ranger manages access control through a user interface that ensures consistent policy administration in CDP clusters.

Security administrators can define security policies at the database, table, column, and file levels, and can administer permissions for groups or individual users. Rules based on dynamic conditions such as time or geolocation can also be added to an existing policy rule. Ranger security zones enable you to organize service resources into multiple security zones.

Ranger also provides a centralized framework for collecting access audit history and reporting data, including filtering on various parameters.

Apache Atlas

Apache Atlas provides a set of metadata management and governance services that enable you to manage CDP cluster assets.

  • Search and Proscriptive Lineage – facilitates pre-defined and ad hoc exploration of data and metadata, while maintaining a history of data sources and how specific data was generated.
  • Ranger plugin for metadata-driven data access control.
  • Flexible modeling of both business and operational data.
  • Data Classification – helps you understand the nature of the data within Hadoop and classify it based on external and internal sources.