What’s new in CDP Private Cloud Base

CDP Private Cloud Base offers the best of runtime services that provide services such as goverance, access control, compliance management, troubleshooting, and scheduling, to both CDH and HDP users.

For CDH users

Apache Atlas provides data governance capabilities for Hadoop. Apache Atlas serves as a common metadata store that is designed to exchange metadata both within and outside of the Hadoop stack. The close integration of Atlas with Apache Ranger enables you to define, administer, and manage security and compliance policies consistently across all components of the Hadoop stack.
CDP security components enable you to control access to CDP services and data sets, and also provide access to auditing and reporting.
Data Analytics Studio
Data Analytics Studio (DAS) is an application that provides diagnostic tools and intelligent recommendations to make the business analysts self-sufficient and productive with Hive. DAS helps you to perform operations on Hive tables and provides recommendations for optimizing the performance of your queries.
Apache Phoenix is an add-on for Apache HBase that provides a programmatic ANSI SQL interface. Apache Phoenix implements best-practice optimizations to enable software engineers to develop HBase based next-generation applications that operationalize big data. Using Phoenix, you can create and interact with tables in the form of typical DDL/DML statements using the Phoenix standard JDBC API.
YARN Capacity Scheduler
CDH offered the Fair Scheduler and HDP offered the Capacity Scheduler. After a thorough analysis of the YARN schedulers available in the legacy platforms, for CDP, Cloudera chose the Capacity Scheduler as the supported YARN scheduler. In CDP Capacity Scheduler, functionalities of the two schedulers are merged to minimize the impact to CDH users going through this transition.

For HDP users

Cloudera Manager
Cloudera Manager replaces Apache Ambari. It provides the operational interface to the cluster, allowing administrators to do common installation, configuration, and break/fix activity, such as changing service properties, adding or restarting services, or evaluating overall cluster performance, similar to Ambari Metrics Service and Grafana in HDP.
Apache Impala provides high-performance, low-latency SQL queries to run on data that is stored in popular Apache Hadoop file formats. The fast response for queries enables interactive exploration and fine-tuning of analytic queries, rather than long batch jobs traditionally associated with SQL-on-Hadoop technologies. Impala integrates with the Apache Hive Metastore (HMS) database, to share databases and tables between both components. The high level of integration with Hive, and compatibility with the HiveQL syntax lets you use either Impala or Hive to create tables, run queries, load data, and so on.
Apache Kudu is a columnar storage manager developed for the Hadoop platform. Kudu shares the common technical properties of Hadoop ecosystem applications: Kudu runs on commodity hardware, is horizontally scalable, and supports highly-available operation. Kudu's benefits include:
  • Fast processing of OLAP workloads
  • Integration with MapReduce, Spark, Flume, and other Hadoop ecosystem components
  • Tight integration with Apache Impala, making it a good, mutable alternative to using HDFS with Apache Parquet
  • Strong but flexible consistency model, allowing you to choose consistency requirements on a per-request basis, including the option for strict serialized consistency
  • Strong performance for running sequential and random workloads simultaneously
  • High availability
  • Structured Data Model
Hue is an open source analytics workbench designed for fast data discovery, intelligent query assistance, and seamless collaboration. SQL developers can do analytics on Hive and Impala, browse and load data to HDFS and also build Oozie workflows inside Hue.

For all users

Ranger RMS
Ranger Resource Mapping Service (RMS) enables automatic translation of access policies for Hadoop SQL (Hive and Impala) to HDFS. With the help of Ranger RMS any user with access permissions on a Hive table automatically receives similar HDFS file level access permissions on the table’s data files. So, Ranger RMS allows you to authorize access to HDFS directories and files using policies defined for Hive tables. This is similar to the Sentry HDFS-ACL Sync feature that was present in CDH but implemented in a different way. Read more about Ranger RMS in this blog.
Ozone is a scalable, redundant, and distributed object store, optimized for big data workloads. Apart from scaling to billions of objects of varying sizes, Ozone can function effectively in containerized environments such as Kubernetes and YARN.