Overview of Cloudera Data Platform Data Center

Overview of CDP Data Center

Cloudera Data Platform (CDP) Data Center is the on-premises version of Cloudera Data Platform. This new product combines the best of Cloudera Enterprise Data Hub and Hortonworks Data Platform Enterprise along with new features and enhancements across the stack . This unified distribution is a scalable and customizable platform where you can securely run many types of workloads.

CDP Data Center supports a variety of hybrid solutions where compute tasks are separated from data storage and where data can be accessed from remote clusters. This hybrid approach provides a foundation for containerized applications by managing storage, table schema, authentication, authorization and governance.

CDP Data Center is comprised of a variety of components such as Apache HDFS, Apache Hive 3, Apache HBase, and Apache Impala, along with many other components for specialized workloads. You can select any combination of these services to create clusters that address your business requirements and workloads. Several pre-configured packages of services are also available for common workloads. These include:
  • Data Engineering

    Ingest, transform, and analyze data.

    Services: HDFS, YARN, YARN Queue Manager, Ranger, Atlas, Hive Metastore, Hive on Tez, Spark, Oozie, Hue, and Data Analytics Studio

  • Data Mart

    Browse, query, and explore your data in an interactive way.

    Services: HDFS, YARN, YARN Queue Manager, Ranger, Atlas, Hive Metastore, Impala, and Hue

  • Operational Database

    Low latency writes, reads, and persistent access to data for Online Transactional Processing (OLTP) use cases.

    Services: HDFS, Ranger, Atlas, and HBase

When installing a CDP Data Center cluster, you install a single parcel called Cloudera Runtime that contains all of the components. For a complete list of the included components, see Cloudera Runtime Component Versions.

In addition to the Cloudera Runtime components, CDP Data Center includes powerful tools to help manage, govern, and secure your cluster.

CDP Data Center Tools

Cloudera Manager

CDP - Data Center uses Cloudera Manager to manage one or more clusters and their configurations and to monitor cluster performance. You also use Cloudera Manager to manage installations, upgrades, maintenance workflows, encryption, access controls, and data replication. In future releases you will also be able to manage Cloudera Enterprise CDH clusters. You can also use Cloudera Manager to create a Virtual Private cluster that allows you to separate compute resources from data storage and to share data storage among compute resources. See Cloudera Manager Overview.

Apache Atlas

Also included in CDP - Data Center is Apache Atlas, used to provide governance for your data. Apache Atlas serves as a common metadata store that is designed to exchange metadata both inside and outside of the Hadoop stack. Close integration of Atlas with Apache Ranger enables you to define, administer, and manage security and compliance policies consistently across all components of the Hadoop stack. For customers familiar with Cloudera Enterprise, Apache Atlas replaces Cloudera Navigator Metadata Server. It provides the following capabilities:
  • Flexible metadata models

  • Entity search using model attributes, classifications (tags), and free text

  • Lineage across entities based on processes applied to the entities

For more information, see Governance Overview.

Apache Ranger

Apache Ranger provides auditing, authentication, and authorization functionality for your CDP - Data Center clusters.

Apache Ranger provides a centralized framework for collecting access audit history and reporting data, including filtering on various parameters. Ranger enhances audit information obtained from Hadoop components and provides insights through this centralized reporting capability.

Apache Ranger also manages access control through a user interface that ensures consistent policy administration across CDP - Data Center components. Security administrators can define security policies at the database, table, column, and file levels, and can administer permissions for specific LDAP-based groups or individual users. Rules based on dynamic conditions such as time or geolocation can also be added to an existing policy rule. The Ranger authorization model is pluggable and can be easily extended to any data source using a service-based definition.

For customers familiar with Cloudera Enterprise, Apache Ranger replaces Sentry and Navigator Audit Server and also provides the following capabilities:
  • Better fine-grained access controls:

    • Dynamic Row Filtering

    • Dynamic Column Masking

    • Attribute-based Access Control

    • SparkSQL fine-grained access control

  • Rich policy features

    • Allow/Deny constructs, Custom policy conditions/context enrichers, time bound policies, Atlas integration (for tag based policies)

  • Extensive Access Auditing with rich event metadata

For more information, see Security.