CDP PVC Base new features

If you are a CDH or a HDP user, review the new features available in CDP Private Cloud Base, in addition to the features carried over to CDP from the CDH and HDP releases.

CDH to CDP

  • Ranger 2.0
    • Dynamic row filtering and column masking
    • Attribute-based access control and SparkSQL fine-grained access control

    • Sentry to Ranger migration tools

    • New RMS provide HDFS ACL Sync

  • Atlas 2.0
    • Business Metadata support by providing entity model extensions

    • Bulk import of Business Metadata attribute associations and glossary terms

    • Enhanced basic search and filtered search

    • Multi-tenancy support and easier administration with enhanced UI

    • Data lineage and chain of custody

    • Advanced data discovery and business glossary

    • Navigator to atlas migration

    • Improved performance and scalability

    • Integrating Ozone with Apache Atlas

  • Hive 3
    • Hive-on-Tez for better ETL performance
    • Support for Atomicity, Consistency, Isolation, and Durability (ACID) transactions

    • Comprehensive ANSI 2016 SQL coverage

    • Support major performance improvements

    • Query result cache

    • Surrogate keys

    • Materialized views

    • Scheduled Queries, Rebuilding Materialized Views Automatically Using SQL

    • Auto-translation for Spark-Hive reads, no HWC session needed

    • Hive Warehouse Connector Spark direct reads

    • Authorization of external file writes from Spark

    • Improved CBO and vectorization coverage

  • Ozone
    • 10x scalability of HDFS
    • Support for Billion objects and S3 native support

    • Support dense data nodes

    • Fast restarts and easy maintenance

  • HBase
    • HBase-Spark connector
    • Re-architected Medium Size Objects (MOB) for better compaction and performance

  • Hue

    • Gateway-based SSO with Knox
    • Support for Ranger KMS-Key Trustee Integration

  • Kudu
    • Fine grained Authorization with Ranger
    • Support for Knox

    • Operation enhancements with rolling restarts and automatic rebalancing

    • Lots of usability improvements

    • Addition of new data types like DATE, VARCHAR and support for HybridClock timestamp

  • YARN
    • New Yarn Queue Manager
    • Placement Rules enable you to submit jobs without specifying the queue name

    • Capacity Scheduler leverages Delay Scheduling to honor task locality constraints

    • Preemption allows applications of higher priority to preempt applications of lower priority

    • Same queue name under different hierarchies

    • Moving applications between queues

    • Yarn Absolute Mode support

This is a generic service-level architecture of the components in the CDH stack. Components in Cloudera Applications, Operations and Management, and Encryption boxes run outside of the cluster envelope defined in the CDH Cluster Services perimeter.
Components marked with a red X are either deprecated and removed or replaced with an alternative component in CDP. These changes are noted in the CDP Cluster Architecture slide.
The service changes from CDH to CDP are:
  • Flume to Cloudera Data Flow
  • Navigator to Ranger/Atlas
  • Sentry to Ranger
  • KeytrusteeKMS to RangerKMS
  • HSM KMS to Key HSM
  • Hive-on-Spark/MR to Hive-on-Tez
  • YARN Fairshare to YARN Capacity
  • Spark 1.6 to Spark 2.4
  • NavOpt to WorkloadXM
  • Pig to Hive or Spark

HDP to CDP

  • Cloudera Manager
    • Virtual private clusters
    • Automated wire encryption setup

    • Fine-grained role-based access control (RBAC) for administrators

    • Streamlined maintenance workflows

  • Solr 8.4
    • Relevance-based text search over unstructured data (text, pdf, .jpg,.)
  • Impala
    • Better fit for Data Mart migration use cases (interactive, BI style queries)
    • Ability to query high volumes of data ("big data") in large clusters
    • Distributed queries in a cluster environment, for convenient scaling
    • Integration with Kudu for fast data and Ranger for authorization policies
    • Fast BI queries enable using a single system for big data processing and analytics, so customers avoid costly modeling and ETL to add analytics on to the data lake.
  • Hue
    • Built-in SQL editor with intelligent query auto complete
    • Share query, chart results, and download for any database

    • Easily search, glance, and import datasets or jobs

  • Kudu
    • Better ingest and query performance for fast changing / updateable data. Reporting with update support through Kudu and Impala
    • Real-time and streaming applications with Kudu + Spark
    • Time series analytics, event analytics and real time data warehouse best Querying Experience with the most intelligent autocompletes
  • YARN
    • Tools to transition to Capacity Scheduler
    • New Yarn Queue Manager

    • Capacity Scheduler leverages Delay Scheduling to honor task locality constraints

    • Preemption allows applications of higher priority to preempt applications of lower priority

    • Same queue name under different hierarchies

    • Moving applications between queues

    • Yarn Absolute Mode support

  • Encryption
    • Auto-TLS feature automates all the steps required to enable TLS encryption
    • Ranger KMS integration with Key Trustee Server to provide additional key provider storage

    • Encryption at rest with NavEncrypt