CDP Public Cloud: November 2020 Release Summary

Cloudera is pleased to announce the latest updates to CDP Public Cloud.

Highlights

  • A list of specific outbound destinations that must be available for CDP Public Cloud has been published
  • Data Warehouse has a number of performance and security improvements as well as better integration with Azure services
  • Machine Learning has a number of usability improvements, new lightweight runtimes and improved scaledown in Azure
  • Data Engineering has a number of management improvements including CLI support and the ability to run multiple instances in the same environment
  • SDX has a number of availability improvements, including backup/restore and enabling high availability
  • New training modules are available in the CDP Public Cloud Administration course

New Or Updated Capabilities

Documentation

Cloudera Data Warehouse

  • Improvements for Azure based workloads

    • Save Costs: Executor nodes for virtual warehouses on Azure can now be scaled down to zero nodes, therefore using zero Azure resources for idle workloads saving costs.

    • Standardize and streamline CDW analytics: Pushing CDW metrics and logs into Azure’s log analytics gives Azure administrators a consistent and standard view of CDW together with all other platforms they manage, in one place

    • Simplify debugging of virtual warehouses on Azure with Grafana Loki LogQL to query kubernetes logs

    • Tightening security model with a Unified Resource Group making the Owner role no longer required, only the contributor role is now required.

  • More flexibility in SQL Engine choice

    • Impala can now match Hive full support for reading tables based on ORC file format, improving collaboration between different classes of SQL users across the data lifecycle. Now, users can select the SQL Engine that best matches their workload, and share common data, without restrictions tied to the file format.
  • Greater robustness and stability for virtual warehouses

    • High availability, fault tolerance and multi-threading improvements to Impala strengthen enterprise-grade resilience for Impala workloads

    • Increased transaction integrity with greater stability for high velocity data with Hive ACID and Hive ACID compaction

  • Greater performance for virtual warehouses using Hive LLAP

    • Better management for Ad-hoc and short-running queries with reduced startup latency, and other query performance improvements, increasing the use case flexibility for Hive LLAP users. Users can run an even greater variety of query types with Hive while meeting or exceeding performance needs.
  • Improved security with machine user implementation

    • Securing your virtual data warehouse requires an automated account that can assist in communication between components and the underlying data lake. To that end, a new “machine user” is now created and is synchronized with the FreeIPA Management System (FMS) directly, so that Hue, Impala, Data Visualization, and more, can communicate with the services within the Data Lake associated with the virtual warehouse, without having to use the FreeIPA LDAP admin account, as before. As a security measure, Cloudera recommends upgrading the CDW environment, Database Catalogs, and the Virtual Warehouses to the latest CDW release.

Cloudera Data Engineering

  • CDP CLI Integration

    • Administrators can now automate the enabling of CDE services and creation of Virtual Clusters through CDP CLI. Jobs will continue to be managed through the CDE CLI shipped with the service.
  • Multiple CDE Services

    • It’s now easier to enable CDE service multiple times within the same environment (datalake/SDX). This allows admins to set up multiple CDE services with differing instance profiles and allows for easier consumption tracking through AWS tags at the service level.
  • Python virtual environments

    • Users can now specify a list of python libraries as dependencies for Pyspark jobs. This can be specified through a requirements.txt file that is uploaded and managed through CLI/API.
  • CDP Trial Tours

    • The first trial tour for Data Engineering admins is now available.

Cloudera Machine Learning

  • Updated Projects Dashboard

    • Upon logging in to CML, users have a new streamlined page to be able to access their Projects. Projects can be displayed in a Summary card-based view for direct access, or in a Detail table-based view to be able to jump directly to different workloads (Sessions, Experiments, Models, etc) within a particular Project. In addition, the Resource Usage Details can be displayed or hidden as needed.
  • Scaling Workers on Azure

    • On Azure, the number of worker nodes can now be scaled to zero when the workspace is idle.
  • ML Runtimes

    • As an alternative to the existing Engines in CML, ML Runtimes are more lightweight than the current monolithic Engines. By specifying the desired Editor, Kernel, Edition, and Version, a streamlined Runtime will be used to run the user’s code in Sessions, Jobs, Experiments, Models, or Applications.
  • Asynchronous Project Creation

    • New Project creation, in particular via Git or forking from an existing project, now executes in the background so that the user does not have to wait.
  • Sessions List Export

    • The date format on the Sessions List Export has been updated to include the complete date plus hours, minutes, seconds and a decimal fraction of a second

SDX

  • [Tech Preview] SDX Backup

    • Reduced risk and simplified management to CDP continuity and recovery through convenient and straightforward backup as well as restore for SDX configuration, now also on Azure (was already in Tech Preview on AWS).
  • [Tech Preview] SSL Enforcement of SDX Database Connections

    • Improved security through strict enforcement of HTTPS rather than HTTP for all network connections, ensuring your data remains secure.
  • [Tech Preview] Medium Duty SDX/Data Lake Clusters

    • Experience more robust operation in case of failure with the first release of the Medium-Duty Data Lake. Resilience of master and IDBroker nodes ensure failure of either will not affect compute-engine clients.

    • Currently, only available for the Data Engineering and Data Mart templates for Data Hub

  • [Tech Preview] FreeIPA HA

    • Improved resilience for CDP’s FreeIPA identity management, ensuring service continuation in case of instance failure(s).

Management Console & Control Plane

  • Billing and Metering in MyCloudera Portal

    • Plan Summary is now available for all customers. Prepay subscriptions see a consumption burn-down, whereas pay-as-you go and trial subscriptions display total consumption.

Full release documentation including release notes for each service is available here (Service Name → Release Notes → What’s New).