CDP Public Cloud: December 2020 Release Summary

Cloudera is pleased to announce the latest updates to CDP Public Cloud.

Highlights

  • We have completed the AICPA Service Organization Control 2 (SOC 2) Type 2 audit for Cloudera Data Platform and are compliant
  • SDX enhancements including Backup/Restore, HA for CDP Identity Manager and Fine Grained Access Control for S3 (Tech Preview)
  • CDW enhancements including support for UDF, improved scalability, robustness and cost management
  • CDE improvements including scaling, tuning, analytics and debugging ; in addition, custom Airflow DAGs are in Tech Preview
  • CML adds usability enhancements; additional instance types are supported as a Tech Preview
  • Streams Replication Manager (SRM) is now available with Streams Messaging clusters in Data Hub

New Or Updated Capabilities

Documentation

Cloudera Data Warehouse

  • Better Data Warehouse Analytics for SQL Developers with SQL Improvements

    • Hive User Defined Functions (UDF) now enabled on Azure (as they have been on AWS)

    • More expressions possible in SQL for Hive, making more sophisticated reports and dashboards possible in a single statement

  • Better Data Warehouse Scalability for BI Analysts and DBAs with Increased Performance and Robustness

    • Improved robustness for Hive bucketing, and Hive Virtual Warehouse LDAP authentication

    • More performant deterministic query planning time for large Hive queries

    • Faster results with more accurate query plans for Impala (via host count fix)

  • Improved Cost Management and Control for System Administrators on AWS and Azure

    • Reduced cost overhead with a single SQL database to host multiple CDW environments instead of multiple

    • Improved cost control with Impala manual start and stop, to override automation if appropriate

    • Reduce unnecessary idle costs with fully shut down coordinator instances

  • Improved Robustness for Key Administrator Workflows

    • Running multiple CDW instances in a single resource group

    • Upgrade of VW & DB Catalog

Cloudera Data Engineering

  • Better scaling

    • Larger compute instance types to accommodate heavier ETL workloads and tweaks to infra services for higher job submission API throughput

    • Better spark defaults tuned for kubernetes improving scale up and stability.

  • Run profiling analysis on demand for additional tuning metrics

    • Users can trigger additional profiling analysis for any job. This will provide memory and cpu utilization, and stage level cpu flamegraphs.
  • WXM Integration

    • CDE services can now share spark application metadata and metrics with WXM for better visibility into aggregate workloads across the entire CDP environment, manage SLAs, and identify additional tuning opportunities.
  • Easy log configuration & access

    • Full logs can now be downloaded as a new download option along with a quick bookmark to the S3 location of the spark application logs.

    • The UI now supports setting the log level from OFF to DEBUG and TRACE.

  • Lightweight Back and restore

    • Use the CDE CLI & API to backup and restore jobs from one virtual cluster into another virtual cluster within the same CDE service or completely new one. Helps support upgrades as it requires enabling a new service.
  • Create/clone job from existing run for easier debugging

    • CLI now supports cloning job runs, carrying over configs and parameters, making it easier to troubleshoot runs with failures.
  • [Tech Preview] Self-authored Airflow DAGs

    • DE and ML practitioners can now define their own pipelines packaged as an Airflow python DAG. Currently supported CDP operators include running Spark jobs on CDE and hive jobs on CDW.

Cloudera Machine Learning

  • Ability to update ML Workspace settings

    • Customers can now update several settings of an already provisioned ML Workspace, including the autoscaling settings, authorized IP ranges for the k8s API server, and load balancer source ranges. In particular, being able to update the autoscaling settings will help customers to better handle additional load on the ML Workspace.
  • [Tech Preview] Additional Instance Types

    • Customers can now provision ML Workspaces and choose from a larger list of AWS and Azure instances, thereby enabling the usage of larger instances than were previously available.

SDX

  • Fine-grained access control for S3 storage on DataHub (aka RAZ for S3) available for tech preview

    • Reduce security risk, improve compliance and eliminate unintended data deletion with the addition of the Ranger Authentication Service (RAZ) for AWS S3. CDP on AWS users gain fine-grained access control policy and auditing capabilities on their file system accesses.

    • Enable customers migrating from on-premises or IaaS clusters using EBS and who have relied on Ranger HDFS policies or HDFS ACLs to have the same level of access control and audit on S3. This is especially useful for Data Hub Spark and Hive external table workloads that require direct access to files. RAZ for S3 Environments can be created using the CDP environment wizard as well as via the CLI.

  • Cloudera Data Catalog now shows Metadata Audit information.

    • Data Catalog users benefit from ever richer Asset 360 views, delivering more complete insight on their data.

    • The new “Metadata Audit” tab on the Asset 360 views lets users directly track all changes to their data assets’ metadata.

  • Improved lineage information for NiFi flows interfacing with S3 and ADLS

    • Data Lineage is now correctly captured and displayed in Atlas and the Data Catalog for NiFi flows which read/write data from/to S3 or ADLS.
  • Datalake Backup/Restore is now GA

    • You can backup and restore the metadata maintained in the Data Lake services. Data Lake backup and restore is supported from Cloudera Runtime 7.2.1+ on AWS and Cloudera Runtime 7.2.2+ on Azure.
  • High Availability for CDP Identity Manager (Free IPA) is now GA

    • Up to 3 instances of CDP Identity Manager (Free IPA) can be deployed via the Management Console GUI or CLI when creating an environment.

    • Note: repair of failed FreeIPA instances is still a manual process

Data Hub

  • Data Discovery and Exploration is now GA

    • The pre-templated search service that makes building search applications easier and faster, in cloud, has graduated from tech preview to production ready.
  • Streams Replication Manager (SRM) is now available with Streams Messaging clusters

    • Customers can now use SRM to migrate data from on-premises Kafka clusters to cloud based Streams Messaging clusters. SRM also enables continuous replication between multiple Streams Messaging clusters across different cloud regions or even cloud providers.

    • Every new Streams Messaging: Light Duty cluster comes with the SRM services installed by default. *Streams Messaging: Heavy Duty *clusters now allow optional SRM nodes to be provisioned.

    • Requires version 7.2.6 of the Cloudera Runtime

Management Console & Control Plane

  • AICPA Service Organization Control 2 (SOC 2) Compliance

    • Cloudera is pleased to announce that we have completed the AICPA Service Organization Control 2 (SOC 2) Type 2 audit for Cloudera Data Platform and are compliant
  • Control Plane Audit Improvements

    • Audit events can now be archived into a customer-managed cloud storage

    • Expanded to include CDW, and CML events

  • Global Settings menu

    • Account-wide settings (audit, tags, and telemetry) have been moved to Global Settings menu in the Management Console
  • cdpcli-beta is now available

    • Features under tech preview will now be available in the beta cdpcli

    • To install cdpcli-beta create a Python virtual environment and run: pip3 install cdpcli-beta

Full release documentation including release notes for each service is available here (Service Name → Release Notes → What’s New).