CDP Public Cloud: April 2021 Release Summary

Data Catalog

Business Glossary

Data Catalog introduces the following addition: Support for Business Glossary and usage of “terms” to associate datasets.

Data Engineering

API keys

CDE users can now use API keys, managed using the CDP user management service (UMS), to interact with the CDE Jobs API using the command line.

Raw Scala code

Users can now submit jobs with raw Scala code, without compiling. These jobs run spark-shell to process the application file.

Diagnostic bundles

Admins can now access summary diagnostic logs directly on their local machine. A new Diagnostic page has been added to the CDE Service details to generate and download the bundle.

Force TLS certificate renewal

CDE services older than 90 days will have expired TLS certificates. A new action has been added to the CDE service hamburger menu to renew the certificates and avoid access issues for DE users.

[Preview] GANG scheduling

GANG is a new resource scheduling policy that overcomes scale-up challenges in situations where high rates of job submission lead to queuing. The new scheduling policy moves jobs off the queue in batches. This clears up the queue, forces scale-up of nodes to process the burst of incoming jobs, and reduces wait and startup time of jobs. By default, GANG scheduling is disabled. It can be turned on for specific jobs by adding a new job-level configuration option.

Data Visualization

New CML engine

docker.repository.cloudera.com/cloudera/cdv/cmldataviz:6.2.5-b25

New features

[Preview] Use natural language search to find relevant information in your datasets through natural language statements, and render it in graphical format.

VIZ-468 – Account lockout

You can enable lockout after too many failed login attempts by adding AXES_ENABLED to the advanced site settings in CML or CDSW. This feature is disabled by default.

Improvements

VIZ-145

Improved Solr aggregation and faceting.

Fixed issues

VIZ-362, VIZ-420, VIZ-533

Fixed bugs for CSV and Excel downloads.

VIZ-387

Error message popovers now close when clicking elsewhere on the page as expected.

VIZ-408

Expansion on crosstab visuals now works as expected.

VIZ-415

Changing values on a custom picklist filter in dashboard mode now works as expected.

VIZ-419

Fixed a bug where filters on an Arcengine connection did not reset after unselecting.

VIZ-437

Enable “Download as Image/PDF” switch in dashboard settings now works as expected.

VIZ-486

Dashboards created from Arcengine connections can now be accelerated as expected.

VIZ-498

Changing colors in a scatter visual now redraws the visual as expected.

Data Warehouse

Virtual Warehouse Improvement

Ability to launch and use the Hue app while the Virtual Warehouse is starting up

When you restart a Virtual Warehouse, it takes a few minutes to start and change to the “Running’ state. In the older CDW version, you had to wait to launch the Hue app until the Virtual Warehouse is up and running. Now you can open Hue and run queries even when it is in the “Starting” state.

DataFlow

This is the first release of Cloudera DataFlow.

Machine Learning

New features

  • Business User Experience - A new user role, ML BusinessUser, provides restricted access to view Applications created in CML.

  • ML Runtimes - Runtimes provide a lightweight alternative to Engines.

  • View All Applications - Admins can view all applications on the application list page.

  • Jobs improvements - Job Scheduling UI now supports cron expressions, job notification email subject line now includes the project name, and the Job History page now shows detailed start and end timestamps.

  • New AWS region - CML now supports the AWS region eu-south-1.

Fixed issues

DSE-15059 - Fixed an issue where project creators may not be able to authenticate to models created by project collaborators using model API key.

Management Console

Runtime 7.2.9

Runtime 7.2.9 is now available and can be used for registering an environment with a 7.2.9 Data Lake and creating Data Hub clusters. See Cloudera Runtime.

GCP quick start

GCP quick start is now available, allowing you to quickly set up a CDP environment. See GCP quick start.

See What’s New Post.

Cluster Definitions page was moved to environment details

The Cluster Definitions page that used to be available in the Shared Resources section was removed. Instead, you can access all cluster definitions related to a specific environment from the Cluster Definitions tab available in the environment’s details. You can save new cluster definitions using the Save As New Definition option available from the Create Data Hub wizard or from CDP CLI using the cdp datahub create-cluster-definition command.

Ranger Audit environment parameter was moved to Data Access section

The option to specify the Ranger Audit role (AWS) managed identity (Azure) or service account (GCP) during environment registration was moved from the Logs - Storage and Audit section to the Data Access section. Consequently, these sections were renamed to Logs and Data Access and Audit.

You can select specific nodes to repair within a Data Lake host group

From the Hardware tab of the Data Lake details, you can click the Repair icon to select specific nodes within a host group to repair.

See What’s New Post.

Updated IAM policy for the provisioning credential for AWS

The IAM policy for the provisioning credential has been updated to include new permissions related to load balancers. The following permissions are now required:

cloudformation:UpdateStack
cloudformation:ListStackResources
elasticloadbalancing:DescribeLoadBalancers
elasticloadbalancing:DescribeTargetHealth
elasticloadbalancing:RegisterTargets
elasticloadbalancing:DeregisterTargets

If you are using a restricted IAM policy for your provisioning credential, you must add these additional permissions.

CDP Public Cloud onboarding documentation was moved

The following publications were moved to the CDP Public Cloud library:

  • Getting Started in CDP Public Cloud
  • AWS/Azure/GCP Requirements
  • AWS/Azure/GCP Quick Starts
  • CDP Public Cloud Security Overview

This CDP Public Cloud library is accessible via the Get Started link on the docs homepage or via the CDP landing page.

The documentation that was moved is available from the following links:

URL redirects were added temporarily; They will eventually be removed, so make sure to update your bookmarks.

This change was made in effort to make CDP Public Cloud onboarding documentation easier to find. The previous location of this content (in the Management Console library) was unintuitive to many users.

AWS/Azure/GCP planning documentation was consolidated

The AWS/Azure/GCP requirements content was consolidated in one place in the CDP Public Cloud library mentioned above. If the content that you have bookmarked throws a 404 error, it is most likely in one of the following publications:

To fix the error, you have three options:

  • Update the URL by replacing /management-console/cloud/environments-/ with /cdp/latest/requirements-/, replacing with "aws", "azure", or "gcp". This works for the content that was moved, but not for topics that were consolidated into other documentation and removed.

  • On the docs homepage, search the website for the content that moved. Search results will direct you to the correct location.

  • Navigate to one of the libraries linked above and find the content that you are looking for.

This change was made in effort to consolidate all documentation related to cloud provider requirements in one place. Previously, the documentation was scattered and users had to click on many links in order to find content.

Registering CDP Private Cloud Base clusters in CDP Public Cloud

You can now register CDP Private Cloud Base clusters as classic clusters in CDP.

  • The CDP Private Cloud Base clusters can be registered via Cloudera Manager for use in Replication Manager.

  • Additionally, you can register CDP Private Cloud Base clusters via Knox for use in Data Catalog. This is a technical preview feature that should not be used in a production environment.

For documentation, see Adding a CDP Private Cloud Base cluster.

See What’s New Post.

Operational Database

No new features

Replication Manager

Registering CDP Private Cloud Base clusters in CDP Public Cloud

You can register the CDP Private Cloud Base as a classic cluster using the Cloudera Manager option in CDP. After registration, you can replicate the data in HDFS and Hive external tables in the classic cluster to CDP Public Cloud.

Workload Manager

No new features