Overview of Cloudera Data Platform Data Center
Overview of CDP Data Center
Cloudera Data Platform (CDP) Data Center is the on-premises version of Cloudera Data Platform. This new product combines the best of Cloudera Enterprise Data Hub and Hortonworks Data Platform Enterprise along with new features and enhancements across the stack . This unified distribution is a scalable and customizable platform where you can securely run many types of workloads.
CDP Data Center supports a variety of hybrid solutions where compute tasks are separated from data storage and where data can be accessed from remote clusters. This hybrid approach provides a foundation for containerized applications by managing storage, table schema, authentication, authorization and governance.
- Data Engineering
Ingest, transform, and analyze data.
Services: HDFS, YARN, YARN Queue Manager, Ranger, Atlas, Hive Metastore, Hive on Tez, Spark, Oozie, Hue, and Data Analytics Studio
- Data Mart
Browse, query, and explore your data in an interactive way.
Services: HDFS, YARN, YARN Queue Manager, Ranger, Atlas, Hive Metastore, Impala, and Hue
- Operational Database
Low latency writes, reads, and persistent access to data for Online Transactional Processing (OLTP) use cases.
Services: HDFS, Ranger, Atlas, and HBase
When installing a CDP Data Center cluster, you install a single parcel called Cloudera Runtime that contains all of the components. For a complete list of the included components, see Cloudera Runtime Component Versions.
In addition to the Cloudera Runtime components, CDP Data Center includes powerful tools to help manage, govern, and secure your cluster.
CDP Data Center Tools
CDP - Data Center uses Cloudera Manager to manage one or more clusters and their configurations and to monitor cluster performance. You also use Cloudera Manager to manage installations, upgrades, maintenance workflows, encryption, access controls, and data replication. In future releases you will also be able to manage Cloudera Enterprise CDH clusters. You can also use Cloudera Manager to create a Virtual Private cluster that allows you to separate compute resources from data storage and to share data storage among compute resources. See Cloudera Manager Overview.
Flexible metadata models
Entity search using model attributes, classifications (tags), and free text
Lineage across entities based on processes applied to the entities
For more information, see Governance Overview.
Apache Ranger provides auditing, authentication, and authorization functionality for your CDP - Data Center clusters.
Apache Ranger provides a centralized framework for collecting access audit history and reporting data, including filtering on various parameters. Ranger enhances audit information obtained from Hadoop components and provides insights through this centralized reporting capability.
Apache Ranger also manages access control through a user interface that ensures consistent policy administration across CDP - Data Center components. Security administrators can define security policies at the database, table, column, and file levels, and can administer permissions for specific LDAP-based groups or individual users. Rules based on dynamic conditions such as time or geolocation can also be added to an existing policy rule. The Ranger authorization model is pluggable and can be easily extended to any data source using a service-based definition.
Better fine-grained access controls:
Dynamic Row Filtering
Dynamic Column Masking
Attribute-based Access Control
SparkSQL fine-grained access control
Rich policy features
Allow/Deny constructs, Custom policy conditions/context enrichers, time bound policies, Atlas integration (for tag based policies)
Extensive Access Auditing with rich event metadata
For more information, see Security.