CDP Public Cloud: April 2021 Release Summary
Data Catalog
Business Glossary
Data Catalog introduces the following addition: Support for Business Glossary and usage of “terms” to associate datasets.
Data Engineering
API keys
CDE users can now use API keys, managed using the CDP user management service (UMS), to interact with the CDE Jobs API using the command line.
Raw Scala code
Users can now submit jobs with raw Scala code, without compiling. These jobs run spark-shell to process the application file.
Diagnostic bundles
Admins can now access summary diagnostic logs directly on their local machine. A new Diagnostic page has been added to the CDE Service details to generate and download the bundle.
Force TLS certificate renewal
CDE services older than 90 days will have expired TLS certificates. A new action has been added to the CDE service hamburger menu to renew the certificates and avoid access issues for DE users.
[Preview] GANG scheduling
GANG is a new resource scheduling policy that overcomes scale-up challenges in situations where high rates of job submission lead to queuing. The new scheduling policy moves jobs off the queue in batches. This clears up the queue, forces scale-up of nodes to process the burst of incoming jobs, and reduces wait and startup time of jobs. By default, GANG scheduling is disabled. It can be turned on for specific jobs by adding a new job-level configuration option.
Data Visualization
New CML engine
docker.repository.cloudera.com/cloudera/cdv/cmldataviz:6.2.5-b25
New features
VIZ-388 – Natural language search
[Preview] Use natural language search to find relevant information in your datasets through natural language statements, and render it in graphical format.
VIZ-468 – Account lockout
You can enable lockout after too many failed login attempts by adding
AXES_ENABLED
to the advanced site settings in CML or CDSW. This
feature is disabled by default.
Improvements
VIZ-145
Improved Solr aggregation and faceting.
Fixed issues
VIZ-362, VIZ-420, VIZ-533
Fixed bugs for CSV and Excel downloads.
VIZ-387
Error message popovers now close when clicking elsewhere on the page as expected.
VIZ-408
Expansion on crosstab visuals now works as expected.
VIZ-415
Changing values on a custom picklist filter in dashboard mode now works as expected.
VIZ-419
Fixed a bug where filters on an Arcengine connection did not reset after unselecting.
VIZ-437
Enable “Download as Image/PDF” switch in dashboard settings now works as expected.
VIZ-486
Dashboards created from Arcengine connections can now be accelerated as expected.
VIZ-498
Changing colors in a scatter visual now redraws the visual as expected.
Data Warehouse
Virtual Warehouse Improvement
Ability to launch and use the Hue app while the Virtual Warehouse is starting up
When you restart a Virtual Warehouse, it takes a few minutes to start and change to the “Running’ state. In the older CDW version, you had to wait to launch the Hue app until the Virtual Warehouse is up and running. Now you can open Hue and run queries even when it is in the “Starting” state.
DataFlow
This is the first release of Cloudera DataFlow.
Machine Learning
New features
-
Business User Experience - A new user role, ML BusinessUser, provides restricted access to view Applications created in CML.
-
ML Runtimes - Runtimes provide a lightweight alternative to Engines.
-
View All Applications - Admins can view all applications on the application list page.
-
Jobs improvements - Job Scheduling UI now supports cron expressions, job notification email subject line now includes the project name, and the Job History page now shows detailed start and end timestamps.
-
New AWS region - CML now supports the AWS region eu-south-1.
Fixed issues
DSE-15059 - Fixed an issue where project creators may not be able to authenticate to models created by project collaborators using model API key.
Management Console
Runtime 7.2.9
Runtime 7.2.9 is now available and can be used for registering an environment with a 7.2.9 Data Lake and creating Data Hub clusters. See Cloudera Runtime.
GCP quick start
GCP quick start is now available, allowing you to quickly set up a CDP environment. See GCP quick start.
See What’s New Post.
Cluster Definitions page was moved to environment details
The Cluster Definitions page that used to be available in the Shared Resources section was removed. Instead, you can access all cluster definitions related to a specific environment from the Cluster Definitions tab available in the environment’s details. You can save new cluster definitions using the Save As New Definition option available from the Create Data Hub wizard or from CDP CLI using the cdp datahub create-cluster-definition command.
Ranger Audit environment parameter was moved to Data Access section
The option to specify the Ranger Audit role (AWS) managed identity (Azure) or service account (GCP) during environment registration was moved from the Logs - Storage and Audit section to the Data Access section. Consequently, these sections were renamed to Logs and Data Access and Audit.
You can select specific nodes to repair within a Data Lake host group
From the Hardware tab of the Data Lake details, you can click the Repair icon to select specific nodes within a host group to repair.
See What’s New Post.
Updated IAM policy for the provisioning credential for AWS
The IAM policy for the provisioning credential has been updated to include new permissions related to load balancers. The following permissions are now required:
cloudformation:UpdateStack
cloudformation:ListStackResources
elasticloadbalancing:DescribeLoadBalancers
elasticloadbalancing:DescribeTargetHealth
elasticloadbalancing:RegisterTargets
elasticloadbalancing:DeregisterTargets
If you are using a restricted IAM policy for your provisioning credential, you must add these additional permissions.
CDP Public Cloud onboarding documentation was moved
The following publications were moved to the CDP Public Cloud library:
- Getting Started in CDP Public Cloud
- AWS/Azure/GCP Requirements
- AWS/Azure/GCP Quick Starts
- CDP Public Cloud Security Overview
This CDP Public Cloud library is accessible via the Get Started link on the docs homepage or via the CDP landing page.
The documentation that was moved is available from the following links:
- getting-started
- aws-quickstart
- azure-quickstart
- gcp-quickstart
- requirements-aws
- requirements-azure
- requirements-gcp
- security-overview
URL redirects were added temporarily; They will eventually be removed, so make sure to update your bookmarks.
This change was made in effort to make CDP Public Cloud onboarding documentation easier to find. The previous location of this content (in the Management Console library) was unintuitive to many users.
AWS/Azure/GCP planning documentation was consolidated
The AWS/Azure/GCP requirements content was consolidated in one place in the CDP Public Cloud library mentioned above. If the content that you have bookmarked throws a 404 error, it is most likely in one of the following publications:
To fix the error, you have three options:
-
Update the URL by replacing /management-console/cloud/environments-
/ with /cdp-public-cloud/cloud/requirements- /, replacing with "aws", "azure", or "gcp". This works for the content that was moved, but not for topics that were consolidated into other documentation and removed. -
On the docs homepage, search the website for the content that moved. Search results will direct you to the correct location.
-
Navigate to one of the libraries linked above and find the content that you are looking for.
This change was made in effort to consolidate all documentation related to cloud provider requirements in one place. Previously, the documentation was scattered and users had to click on many links in order to find content.
Registering CDP Private Cloud Base clusters in CDP Public Cloud
You can now register CDP Private Cloud Base clusters as classic clusters in CDP.
-
The CDP Private Cloud Base clusters can be registered via Cloudera Manager for use in Replication Manager.
-
Additionally, you can register CDP Private Cloud Base clusters via Knox for use in Data Catalog. This is a technical preview feature that should not be used in a production environment.
For documentation, see Adding a CDP Private Cloud Base cluster.
See What’s New Post.
Operational Database
No new features
Replication Manager
Registering CDP Private Cloud Base clusters in CDP Public Cloud
You can register the CDP Private Cloud Base as a classic cluster using the Cloudera Manager option in CDP. After registration, you can replicate the data in HDFS and Hive external tables in the classic cluster to CDP Public Cloud.
Workload Manager
No new features