CDP Public Cloud: March 2021 Release Summary
Highlights
- Medium Duty SDX is Generally Available on all clouds; on AWS, Endpoint Access Gateway can be used to provide secure public access to private deployments
- SDX and Data Hub are now available on Google Cloud
- Runtime Version 7.2.8 is now enabled, which fixes over 400 issues as well as enabling new functionality across the product
- Cloudera Machine Learning now supports a significantly reduced cross account role on AWS and also honors RAZ policies on both AWS and Azure
- [Tech Preview] Cloudera Data Engineering is available on Azure as a Tech Preview
New Or Updated Capabilities
Documentation
-
Management Console
-
Documentation for registering a Google Cloud environment and creating Data Hubs is now available:
-
Medium duty SDX for AWS was added to Data Lake scale.
-
Documentation was updated to include the option to use your existing Azure resource group. See the updated Azure Permissions, Resource group, and Register an Azure environment.
-
The new option to create private endpoints instead of service endpoints for Azure Database for PostgreSQL is now documented in Private endpoint for PostgreSQL.
-
The option to enable Public Endpoint Access Gateway for AWS is documented in Public Endpoint Access Gateway.
-
The new set of options to obtain CDP CLI commands from the CDP web interface is now documented at:
-
The Enabling environment telemetry documentation was updated to include the new option to disable cloud storage logging for an existing environment.
-
The eu-south-1 (Milan) and af-south-1 (Cape Town) AWS regions were added to Supported AWS regions.
-
New documentation on how to Obtain your CDP tenant ID was added.
-
-
Data Warehouse
-
Data Engineering
-
Virtual cluster Back-up & restore of job artifacts
-
Machine Learning
-
Reduced Cloud Permissions on AWS
- CML now supports configuring AWS environment cross-account roles with fine-grained cloud permissions and policies for provisioning and running workspaces.
-
CML Model DNS Resolution
-
DataHub and CDE connectivity to CML services such as workspaces and model endpoints is now possible.
-
This is supported by enabling internal DNS resolution for CML workspace endpoints and models on private IPs; any existing workspaces must be upgraded to the current release first
-
-
RAZ integration
- CML now integrates with RAZ for S3.
-
Cdswctl support
- ML Runtimes is now supported in cdswctl
-
External engine repositories
- Admins can now add basic Docker credentials for external engine repositories.
-
ML workspace provisioning
- In case of validation errors, it is possible to skip preflight validation for ML workspace provisioning. See Provisioning ML Workspaces for more information.
-
[Tech Preview] CML + RAZ for ADLSg2 and RAZ for S3 integration
- CML users will be able to access data lake storage and run spark jobs in CML while also honoring RAZ for ADLSg2 and RAZ for S3 cloud storage access control policies.
Cloudera Data Engineering
-
API keys
- DE users can now use API keys managed through CDP UMS to interact with jobs API via CLI.
-
Raw Scala code
- Users that want to run raw scala code without compiling, can now submit jobs with raw scala code that will run a spark-shell to process the application file.
-
Diagnostic bundles
-
Admins can access summary diagnostic logs directly on their local machine.
-
New “Diagnostic” page has been added to the CDE Service details to generate and download the bundle.
-
-
Force renewal of TLS certs
- A CDE Service older than 90 days will have expired TLS certs. A new action has been added to the CDE Service hamburger menu to renew the certificate and avoid access issues for DE users.
-
[Tech Preview] Azure Support
- CDE can now be deployed on Azure as a Tech Preview
-
[Tech Preview] GANG Scheduling
-
GANG is a new resource scheduling policy that will overcome scale up challenges in situations where there is a high rate of job submission, leading to queuing. The new scheduling policy removes jobs off the queue in batches. This will clear up the queue, force scale up of nodes to process the high burst of incoming jobs, and reduce wait / starting time of jobs.
-
By default this is disabled. It can be turned on for specific jobs by adding a new job-level config option.
-
Data Catalog
-
Searchable user-editable comments and annotations on assets
- Business users and data stewards can now add annotations to hive tables and other data assets in the asset 360 view. These can be used to add domain specific knowledge that are searchable by other users.
-
Updated UX for Assets 360 views
- The Tags section in the data catalog has been given more prominence, making it easier to identify tags sources and their sources (managed tags, system tags, or lineage propagated tags)
SDX
-
Medium Duty SDX is now Generally Available on AWS, Azure, and Google Cloud
- Medium Duty SDX adds more scale, supporting 20 Data Hub clusters, and resiliency
Management Console
-
Environment Role - DataSteward
- A new environment resource role which permits the user to perform data steward tasks without granting full environment administrator rights. This includes managing user access to an environment, ID Broker mapping, and Ranger and Atlas administration.
-
Generate CLI command from the UI
- You can quickly obtain CDP CLI commands from the CDP web interface; currently supported operations are creating environments, data lake, credentials and data hub clusters
-
[Tech Preview] Two new AWS regions
- The eu-south-1 (Milan) and af-south-1 (Cape Town) regions are supported as a Tech Preview
Data Hub
-
Support for Google Cloud
-
Data Hub (and SDX) are now Generally Available on Google Cloud
-
Current cluster definitions are: Data Flow (NiFi), Data Engineering (Spark/Hive) and Streams Management (Kafka)
-
-
Public Endpoint Access Gateway for AWS
- Provides secure external connectivity to UIs and APIs in Data Lake and Data Hub clusters deployed using private networking
-
Azure Single Resource Group Support is now Generally Available
- This will allow deployment of CDP environments with significantly reduced privileges
-
Private endpoints for Azure database for PostgreSQL are now Generally Available
- This option will allow customers to keep their database completely private to their Azure virtual network
-
Runtime 7.2.8 is available
- This release fixes over 400 issues, for details see the Release Notes
-
Kafka topic metadata in Data Catalog
-
Data Stewards can now track Kafka topic metadata in the Data Catalog.
-
Turn on *Enable Auditing to Atlas *for the Kafka service in Cloudera Manager after provisioning a new Streams Messaging Cluster
-
-
[Tech Preview] Fair-share Intra-queue preemption in YARN
- Preemption of workloads to maintain fair share allocations even when the workloads are submitted by the same user and in the same queue.
Education
-
Content is available to all customers with a Cloudera OnDemand library subscription
-
CDP Public Cloud Administration Training
- We have expanded our CDP Public Cloud administration course to include coverage of CDP Public Cloud deployments on Microsoft Azure
-
CDP Data Visualization Training
- First three in a series of modules on CDP Data Visualization: “Accessing DataViz,” “Terminology and Workflow,” and “Datasets”
-
Data Warehousing in Cloudera Data Platform
- First two in a series of modules about data warehousing on the Cloudera Data Platform. The first module - Overview of Data Warehousing in CDP, Analytic engines in CDP
Full release documentation including release notes for each service is available here (Service Name → Release Notes → What’s New).