CDP Public Cloud: June 2020 Release Summary

Cloudera is pleased to announce the latest updates to CDP Public Cloud.

Highlights

  • Cloudera Operational Database (COD) Service is now in Tech Preview for both Azure and AWS
  • Cloudera Data Engineering (CDE) Service is now in Tech Preview for AWS
  • Streams Messaging DataHub Cluster is now GA
  • Runtime 7.2.0 has been released, which unlocks a few key features, but much more importantly fixes over 250 issues

New Or Updated Capabilities

Cloudera Data Warehouse

  • Isolation of Scan-Heavy Queries

    • Queries that scan and process more than a threshold volume of data get their own, one-time, isolated, executor group that is sized to meet the needs of this query. This allows DBAs to automate the processing of one time queries that can affect SLAs of all other queries running on the cluster.
  • Overlay network support for AWS environments is GA

    • Overlay network is a software-defined layer of network abstraction that is used to run multiple separate, discrete virtualized network layers over the AWS VPC network. This allows admins to build Virtual Warehouses using fewer IP addresses.
  • Virtual WH Access Control (Tech Preview):

    • Ability to restrict use Virtual Warehouses--login, execute queries, based on LDAP group membership.
  • Query Transparent Retry (Tech Preview):

    • If submitted Impala queries fail via Hue, other clients at query startup time or fragment fails, query will be retried a configured number of times. This is turned on via a session flag.

Cloudera Machine Learning

  • CML Trial Experience for New Users

    • To help new users to get their bearings within the product, we now have an end-to-end in-product walkthrough that takes users through project creation, running sessions, deploying models, and more.
  • Applied ML Prototype - Churn Demo

  • New Updates with Engine:12

    • To keep up with new functionality and key fixes from the community, R has been upgraded to version 3.6.3 and Python3 has been upgraded to 3.6.10.

Cloudera Operational Database

  • Cloudera Operational Database Service is now Tech Preview on both Azure and AWS

    • It is an autonomous, multimodal, scale-out database supporting NoSQL and relation SQL.

    • Customers can now create databases without worrying about the infrastructure requirements

    • It includes support for secondary indexes, star schemas and views are available when using the relational SQL modality

    • Authorization is integrated with Cloudera’s SDX offering providing a single place for all authorization

    • Operational database template includes HUE for both NoSQL & SQL modes w/ support for DDL, data exploration and query

    • It supports auto-scaling, pausing and resuming of the database

    • Integration with Data Catalog for 360 views

    • Integrated with SDX - Ranger for authorization, Atlas for Data Lineage, Data Analytics Studio for Data Stewards

Cloudera DataFlow

  • Streams Messaging for Data Hub Template - based on Kafka 2.4 - is now GA

    • Streams Messaging for Data Hub is now GA after being in technical preview mode for a few months. This enables you to launch Kafka clusters to the public cloud in a matter of minutes.

    • Support for Schema Registry in this template allows for centralized schema management. As a shared repository, Schema Registry enables Kafka messages to be more nimble by allowing them to just carry a reference to the actual schema.

    • Support for Streams Messaging Manager (SMM) in this template allows for monitoring and managing Kafka clusters. With every enterprise struggling with Kafka blindness, SMM provides end-to-end visibility of your Kafka clusters as well as about the data that is flowing from the producers to the consumers through the various topics.

    • Key Features -

      • Support for Kafka 2.4.1
      • Schema Registry now highly available as well as integrated with Apache Ranger
      • Streams Messaging Manager integrated tightly with Schema Registry

Cloudera Data Engineering

  • The Cloudera Data Engineering (CDE) service is now in Tech Preview on AWS

    • Cloudera Data Engineering (CDE) is a new curated experience for data engineers focused on deploying and orchestrating data transformation jobs using Spark at scale.

    • The key to this experience is a central interface that simplifies the job management life cycle from scheduling, deploying, monitoring, debugging, and promotion which alleviate many of the challenges with running spark jobs in production. Similar to CML and CDW, CDE is cloud native leveraging Kubernetes where Platform admins can quickly provision virtual compute clusters with strong isolation, capacity auto-scaling and quotas for cost management.

    • Key Features:

      • Simplified job management life cycle thru a centralized interface for scheduling, deploying, monitoring & debugging, and promotion.
      • Apache Airflow-based scheduling service - for orchestration of complex data pipelines with job dependencies.
      • Visual troubleshooting and performance tuning of Spark jobs and integration with WXM.
      • Rich API support for CI/CD use cases and 3rd-party integrations
      • Containerized spark service for mixed-version spark deployments.
      • Automatic lineage capture
  • The “7.2 - Data Engineering” DataHub Template Includes a YARN Queue Manager

    • Data Engineering cluster administrators can now leverage the visual YARN Queue Manager via Cloudera Manager to isolate and prioritize workloads within a cluster simplifying administration and management of workloads (this feature was announced in the April Release Summary, but can only be enabled in clusters created with Runtime 7.2.0)

Replication Manager

  • Support for CDH/HDP to Cloud Migration - AWS & Azure

    • Support for replication from HDP (Hive, HDFS) and CDH (Hive, Impala and HDFS) to CDP Public Cloud, making real-time migrations a reality.

Data Catalog

  • Asset 360 views for HBase Tables, Hive columns, S3 buckets, MLops models and MLOps projects.

    • Revamped Asset 360 views and navigation simplifies building trust and understanding for data and its lineage. Views now cover more asset types and provide in depth governance insight (overview information, schemas, access policies and audits).
  • User Generated Social metadata toggle

    • The Data Catalog’s social features can drive understanding, trust and utilization through collaboration by capturing the collective knowledge of data users and their user generated content. However some organizations with strict data control requirements may not want to use this capability as the insight is stored in the Cloudera Control Plane. This feature allows social metadata to be enabled or disabled.
  • On-demand profiler execution and last run info

    • Build confidence in data set statistics by seeing when the data was last profiled; directly refresh outdated information by triggering new profiler runs.
  • UX for creating for classification tags

    • Simplify the creation and lifecycle management of classification tags. Previously defining new tags would require interacting with both Data catalog and Apache Atlas. Now this can be doing in fewer steps interactively using the Data Catalog alone

Management Console & Control Plane

  • CDP will create AWS VPC Endpoints when the auto-create VPC option is chosen

    • This will make a private subnet deployment more secure as traffic to these endpoints will stay within the AWS network

    • Gateway endpoints are created for S3, DynamoDB as well as Interface endpoints for STS, ECR, EFS, ELB, EC2, Autoscaling and CloudFormation

  • Groups Sync Disabled By Default

    • When you create a Group in CDP, the Sync Membership flag is now off by default. This will eliminate the situation when users would be removed from a CDP group if they weren’t part of the same group in the ID Provider.
  • Support for HDP 2.6.x Classic Clusters Using CCM (Tech Preview)

    • Support for registering Classic clusters using the reverse SSH mechanism is extended to legacy HDP 2.6.x clusters. This will be behind the CLASSIC_CLUSTERS entitlement.

    • Currently, this feature is accessible through the UI only, CLI support will be added in an upcoming release.

Full release documentation including release notes for each service is available here (Service Name → Release Notes → What’s New).