CDP Public Cloud: May 2020 Release Summary

Cloudera is pleased to announce the latest updates to CDP Public Cloud.

Highlights

A major theme for the platform was adding stability and refinement across the platform and experiences
The Operational Database Cluster Definition in DataHub is now GA

New or Updated Capabilities

Cloudera Data Warehouse

Set up faster and easier in an AWS private cloud with simplified deployments, as the private load balancer, private worker nodes option will no longer require specialized configuration.
Faster SQL analytics in the cloud with more granular control and performance optimizations on multi-threaded query execution, improved read performance for ORC tables with nested type columns, and improvements for the automatic updates of metadata in Impala.
Faster visualization of business insight with Hue now supporting SQL applications based on Hive, and Hue metrics on Grafana dashboard for easy to read, rich visuals for any business use case.
Manage and administer your cloud data warehouse easier with improvements to Data Analytics Studio (DAS) user management features, improvements to Impala troubleshooting with diagnostic bundles of log files, and Ranger audit log generation to improve Impala security.

Cloudera Machine Learning

Production Machine Learning Enabled By Default
- As a follow up to April 2020’s GA, the ML Ops feature set is now enabled by default for all accounts. This will enable all ML Engineers and Data Scientists to take advantage of CML’s advanced model monitoring and full lifecycle lineage tracking.
Streamlined New Session Startup
- The user experience for starting a new Session has been simplified. Instead of opening the Workbench by default, Data Scientists see a pop-up appear that asks for the Session details (Engine Editor, Engine Kernel, and Engine Profile). This is particularly useful for Data Scientists who are using editors such as Jupyter and RStudio, as they will be able to start their Session in their editor of choice more quickly.
Firefox Support
- CML now officially supports Firefox as a web browser. Previously, Data Scientists were required to set a specific browser property in order to use CML in Firefox, but this is no longer necessary and Data Scientists can continue using the tools they are most familiar with.
User IDs
- CML Administrators can now view User IDs for each user under the Admin / Users page. This makes it easier for Administrators to cross reference user level logs and monitoring in Grafana.
Updated Landing Page (Technical Preview)
- The initial Projects landing page has a refreshed new design that is available for internal testing only. This update is focused on ensuring that data scientists can see how their current resource reservations compare against their Quota, and updates the list of Projects to be better aligned with design standards.

Cloudera Data Hub

Operational Database
- The Operational Database cluster definition in DataHub is now GA. It is a multi-modal scale-out database offering powered by Apache HBase & Phoenix. We have added support for relational SQL (key-value, wide column) in addition to NOSQL (key-value and wide column) as well as a number of optimizations that make OpDB accessible to app developers that are familiar with traditional RDBMs.
- Two types of secondary indexes (covered & functional) are now supported with the SQL modality. Secondary indexes provide an orthogonal way to access data which improves efficiency and can help avoid full table scans and in many cases reduces access times to single digit milliseconds.
- Authorization is integrated with SDX providing a single place for all authorization enabling enhanced security with deny policies.

Workload Manager

Enhanced Spark Analytics
- Gain deeper insight into Spark workload performance for troubleshooting, increased efficiency and optimization. Advanced spark analysis capabilities include:
  - DAG views
  - Resource consumption
  - CPU Flame graph

SDX

IPA Backups
- The IPA node of an environment now automatically writes regular backups to object storage, improving SDX resilience. Any tenant specific user group and configuration can thus be restored if the node is destroyed.

Data Catalog

Navigation Support for Table Entities within Lineage
- Users can now be more efficient in exploring and understanding their data by navigating from one Asset-360 view of a table (Hive/Impala) to another in its lineage with a single click. This simplifies and streamlines the user experience by eliminating the need to perform new table searches in order to see the details of a related table.

Management Console & Control Plane

Improvements in Data Hub Cluster Scaling
- A number of significant improvements were made to properly clean up cloud resources for scaling up and down Data Hub clusters. This will reduce cloud costs that would have been incurred by any orphaned instances
Edit Subnets
- You can now add new subnets to the configuration of a running environment. This will make it easier to add experiences (CML, CDW) to an existing environment instead of requiring all networks to be pre-defined during environment creation (or having to create a new environment entirely)

Full release documentation including release notes for each service is available here (Service Name → Release Notes → What’s New).