CDP Data Center 7.1 Release Summary

June 2020

Cloudera is pleased to announce the release of Cloudera Data Platform Data Center (CDP DC) version 7.1. This release provides the most powerful and comprehensive data management and analytics platform for on-premises environments. CDP DC 7.1 delivers a number of enhancements focusing on ease of administration, improved stability, and performance at increased scale. Most importantly, this release provides an in-place upgrade path from HDP 2.6.5, CDH 5.13 - CDH 5.16 and CDP DC 7.0 to CDP DC 7.1.

Here are the highlights of this release:

Upgrades & Migrations

  • Support for in-place upgrades from CDH to CDP Data Center from CDH 5.13 - 5.16 clusters to Cloudera Runtime 7.1 and upgrade from Cloudera Manager 5.13 - 5.16 to Cloudera Manager 7.1 in CDP Data Center 7.1.
  • Support for migration from CDH to CDP DC is provided by Cloudera Manager Replication Manager which enables data copy from a CDH 5 or CDH 6 cluster to a Cloudera Runtime 7.1 cluster.
  • Support for in-place upgrades from HDP to CDP Data Center consists of upgrading from HDP 2.6.5.x to Cloudera Runtime 7.1 using Ambari and then migrating the management platform from Ambari to Cloudera Manager. Newly created AM2CM tool captures HDP Cluster’s blueprint from Ambari and converts it to a Cloudera Manager deployment template which is then deployed to Cloudera Manager 7.1 and activates the Cloudera Runtime 7.1 parcels.
    • Upgrading Ambari and HDP to CDP Data Center
  • Support for migration from Sentry to Ranger converts Sentry privileges into their equivalents within Ranger service policies. Tool also supports migration of Kafka policies from Sentry to Ranger and is automatically used to convert Sentry policies to Ranger policies when using Replication Manager to migrate to CDP for HFDS, Impala and Hive.
  • Support for migration from Navigator to Atlas converts Navigator content from business metadata like Tags, Custom properties (definitions and entity assignments), Managed metadata properties (definitions and entity assignments), Original and updated entity names and descriptions and technical metadata from Hive, Impala, Spark, Referenced HDFS / S3 to Atlas content.

Streams Messaging

  • Gain a complete and comprehensive Kafka streaming experience with the addition of several components in CDP Data Center such as Schema Registry, Kafka Streams, Streams Messaging Manager, Streams Replication Manager, Kafka Connect and Cruise Control. This release of CDP Data Center 7.1 will also support Apache Kafka 2.4.
  • Boost Kafka connectivity with HDFS, S3 and Kafka Streams for simple streaming applications against Kafka sources with the new Kafka Connect.
  • Improve operational efficiency with the new Cruise Control that offers API based tools to monitor and assist with rebalancing and scaling of Kafka clusters and topics.
  • Monitor and manage your Kafka clusters end-to-end across producers, consumers, topics and brokers with Streams Messaging Manager. This release also addresses a number of security issues, bug fixes and enhancements.
  • Enable business continuity with Streams Replication Manager that supports replication of your Kafka clusters for disaster recovery and high availability needs. This is tightly integrated into SMM for replication monitoring.
  • Scale up your messaging by leveraging Schema Registry to store and access your schemas across your Kafka cluster. Schema Registry now integrates with Apache Ranger for providing role based access control.

Data Engineering

  • Apache Spark 3 preview - This release is designed to enhance performance and speed for creating and maintaining robust data pipelines across the enterprise. New features include adaptive execution of Spark SQL, better data source v2 performance, binary data source and additional language support.
  • Improved performance and interoperability for Apache Spark with Apache Hive and dynamic partitioning pruning.
  • New Hue integrations with Apache Atlas, the Data Catalog API, Service discovery and Apache Hive language references enables faster and more holistic management of data engineering workflows and pipeline creation.

Data Warehousing

  • Faster SQL analytics on larger data sets, including improved ad-hoc and instant interactive insights over petabytes of data, fast updates and drill down into time series data, and time-based aggregations, in Apache Hive and Impala.
  • Deeper understanding from unstructured data sources in Apache Solr with improvements to techniques used to evaluate, recognize, and discover data of any structure, enriched with any data, leading to more accurate results.
  • Easier visualizations of business insights from improvements to user experience and additional user tools that help to streamline and ease SQL development Hue and Data Analytics Studio (DAS), which results in less time to create dashboards and data visualization, and makes it more accessible for non-technical users.
  • Optimize and cost-control your hybrid cloud data warehouse deployments with intelligent workload optimization from Cloudera Workload Experience Manager (Workload XM).

Machine Learning

  • Data Science Workbench now available on CDP Data Center - CDSW on CDP-DC enables an improved best of breed data science platform experience for your data science teams on our latest enterprise data platform.

    • Future-proof modernization - CDSW on CDP-DC enables continuity in your data science experience at every stage of your cloud journey, enabling a more frictionless path to Cloudera Machine Learning (CML) — Our enterprise cloud-native data science and machine learning platform for CDP Private Cloud and Public Cloud.
  • Advanced control over experiments and model deployment - Ability to select environment variables in model and experiment builds, giving data science teams more control when testing and deploying models into production. Data science teams can now manage workflows with greater precision and expedite model deployments regardless of environmental requirements.

Operational Database

  • Distributed Compaction of Medium-Sized Objects (MOBs) - HBase has long supported medium-sized objects which enable using upto 10MB sized objects in the database (allowing storage of large elements such as photos and videos). This release has implemented distributed compactions which improves performance and reduces application impact when they occur.
  • RBAC policies in Apache Ranger for Phoenix tables which reduces administrative time by making Ranger the single place for setting policies.
  • Read Replica support in HBase 2 enables access to data in a pre-defined millisecond window which ensures 100% availability in the event of a node failure. This is achieved by providing timeline consistency and partition consistency during the outage, instead of strong consistency.

SDX

  • Navigator to Atlas migration is supported through new tools and utilities, making migration from CDH easier. Navigator data can be extracted using the new cnav utility and transformed to Atlas data format, using the nav2atlas utility. The result can be subsequently imported using Atlas’ migration import.
  • Improved support for metadata includes the ability to search on system attributes (technical metadata), and a new business metadata feature (migration from Navigator’s “managed metadata” features). New features include the ability to bulk import business glossary terms, bulk assign business metadata to entities, the ability to purge entities and new authorization privileges for labels as well as business metadata.
  • Impala Column Masking support. Enhanced security, compliance and consistency across CDP, letting organizations bring many more users and use cases to existing Impala databases in a safe and secure manner.
  • Enhanced security with enterprise grade cryptography support with Ranger KMS and Key Trustee Server KMS integration. The integration feature allows either the use of a hardened database or the Key Trustee Server as the backing keystore providing choice and ease of configuration.
  • Ranger KMS now supports new features like Cloudera Manager integration, KTS KMS ACLs to Ranger KMS policies migration and Key Trustee KMS to Ranger KMS KTS migration.
  • Transparent encryption and secure data at rest is now available without requiring changes to applications

Cluster Management

  • In-place upgrades to CDP from CDH 5.13-5.16 and HDP 2.6.5 are provided by Cloudera Manager 7.1. Upgrade tooling includes:

    • Ambari Blueprint to Cloudera Manager template conversion tools

    • FairScheduler to CapacityScheduler migration

    • Pre-upgrade validation checks to assess upgrade readiness

  • SSO and cluster proxying capabilities are provided by Apache Knox across a wide range of web endpoints and APIs, including Cloudera Manager itself.

  • HDFS Upgrade Domains now supports bulk assignment via Cloudera Manager UI

  • Tagging Cloudera Manager adds support to tag hosts, clusters and services via API

  • Custom Linux cgroups can be utilized to launch cluster services

  • FPGA devices can now be explicitly configured for use by YARN applications

  • End-to-end encryption for Zookeeper inter-process communication protects all metadata for all of Cloudera’s services.

Apache Ozone Beta

  • This CDP DC release includes the beta1 release of Ozone, the platform’s new S3-compatible object store, which provides

    • Scale to billion objects of all sizes

    • Native Hadoop API as well as S3 API

    • Provides better handling of small files due

  • As compared to the Ozone Tech Preview, this release includes

    • Support for billion objects

    • Tuning, increased scalability, stability, performance improvements and high availability.

    • In addition features include native support for Hive and Spark.

Platform Certifications

  • Operating Systems: RHEL / CENTOS / OEL 7.6, 7.7
  • Java Runtime: OpenJDK 1.8, OpenJDK 11, Oracle JDK 11
  • Backend Databases: This release adds support for MySQL 5.7, Oracle 12 (Fresh install only), MariaDB 10.2, and Postgres 10.