CDP Public Cloud: December 2021 Release Summary
Data Catalog
Data Catalog introduces deleting profiler clusters.
Data Engineering
Fixed Log4j2 security vulnerability CVE-2021-44228
Removed JndiLookup.class from affected JAR files. For instructions on upgrading your existing CDE services and virtual clusters, see Cloudera Data Engineering fix for CVE-2021-44228.
DataFlow
CVE-2021-44228, CVE-2021-45046, LOGBACK-1591
On December 17, 2021, Cloudera released Cloudera DataFlow (CDF) for Public Cloud version 1.0.3-h1-b6. It addresses 2 CVEs and other vulnerability concerns. Cloudera urges all customers to upgrade their DataFlow services to the latest version.
[Technical Preview] CDF can now be deployed on Azure
Deploying CDF on Azure requires a specific entitlement that is granted by request only. Reach out to your Cloudera Account Representative or open a support ticket to request CDF on Azure.
Custom Processors and Controller Services can be used in NiFi flow deployments
CDF supports deploying NiFi flows that require custom components such as processors or controller services. Users can store their custom NiFi Archive (NAR) files in an object store location in S3 or ADLS and use the Deployment Wizard or the CLI to create their flow deployments accordingly. Once configured, CDF ensures that the custom NARs are available to all NiFi pods as the deployment scales up and down.
Non-transparent proxy support in AWS environments
If you have a non-transparent proxy registered for your CDP environment, CDF reuses the existing configuration and routes its outbound traffic through the proxy.
Note: Depending on the NiFi flows you are using, you might have to configure additional proxy settings if you require the NiFi data to go through the proxy as well.
Private EKS/AKS Kubernetes API Endpoints
CDF now supports private EKS/AKS Kubernetes API Endpoints for CDP environments which have been setup using Cloudera Connectivity Manager (CCM v2). When enabling CDF, you need to select a checkbox to create a private endpoint.
Enabling CDF for an environment with a minimum of one EKS/AKS instance
CDF can now be enabled with a minimum of one EKS/AKS instance and will scale up as required until the specified maximum number of Kubernetes nodes is reached. Previously the minimum number of nodes was set to three.
CDF instance sizing changes
CDF now uses c5.4xlarge
instances on AWS and Standard_D16s_v4
on Azure to provide more efficient infrastructure usage.
NiFi node sizing changes
NiFi Node Sizes in the Deployment Wizard have also been updated to use available resources more efficiently. The new options are:
-
Extra Small – 2 vCores, 4GB memory
-
Small – 3 vCores, 6GB memory
-
Medium – 6 vCores, 12GB memory
-
Large – 12 vCores, 24GB memory
Flow deployment security certificates
Security certificates for deployments are now automatically renewed one month before their expiration. A manual restart is required to pick up the new certificates.
CDF UI guidance on using CDP CLI
The CDF UI now provides guidance on how to use the CDP CLI to enable and disable CDF, upload and delete flow definitions as well as editing existing flow deployments.
CDP Beta CLI support for editing flow deployments
The CDP Beta CLI now includes functionality to edit existing deployments allowing configuration changes such as parameter and KPI updates, scaling configuration, or NiFi version upgrades.
New events and alerts for pod scaling
Introduced deployment pod scaling alerts and events to reflect when a deployment auto scales and when scaling limits are approached.
Data Visualization
New CML engine: docker.repository.cloudera.com/cloudera/cdv/cmldataviz:6.3.5-b5
Cloudera released a new version of CDP Data Visualization (CDV). This new version addresses CVE-2021-44228 which affects products using Apache Log4j2 versions 2.0 through 2.14.1.
CDV is not written in Java and therefore is not vulnerable. However, as the custom CDP Data Visualization engine is built upon a CML engine, it has been updated to the newly released version of the CML engine that removes the vulnerability reported in CVE-2021-44228. For more information about the new CML engine, see CVE-2021-44228 Remediation for CML Data Service.
If you use CDV with Cloudera Machine Learning, update your CML engine to the latest version, which includes fixes for the log4j2 security vulnerability in CML. After updating the engine to the latest version, running CDV instances (applications) should be restarted. For instructions, see Adding Data Visualization engine to CML.
If you use CDV with Cloudera Data Science Workbench, no action is required.
Other changes
-
New CML engine: docker.repository.cloudera.com/cloudera/cdv/cmldataviz:6.3.4-b13 New features and improvements
-
VIZ-810 - When adding a filter shelf to a visual, it is now possible to add and edit a custom filter expression directly from the filter modal.
-
VIZ-896 - Cross Tabulation, Row Listing, and Table visuals now use a default number format with comma separators if a display format is not specified.
-
VIZ-917 - Dashboard snapshot timeouts are now configurable by using an advanced site setting flag.
-
VIZ-925 - The Dataset ID is now shown for each dataset listed on the Dataset overview page.
-
VIZ-917 - Dashboard snapshot timeouts are now configurable by using an advanced site setting flag.
-
VIZ-925 - The Dataset ID is now shown for each dataset listed on the Dataset overview page.
Data Warehouse
CDW upgrade of embedded Log4j version
Cloudera has released a new version of CDW which upgrades the embedded Log4j version to 2.16. This provides a permanent fix for CVE-2021-44228.
To use this version you must upgrade your Database Catalog(s) and Virtual Warehouse(s) to the latest version, which is 2021.0.4.1-3. Additionally, you can create new Virtual Warehouses using the latest version, and those will also have the fix for this vulnerability.
If you had previously applied the configurations shared in CVE-2021-44228 remediation for CDW, then those are no longer required. Please reach out to Cloudera Support if you have questions.
Machine Learning
CML introduces the following new features:
-
CML Users can now discover and read about new features, blog posts, FFL research reports from within CML without leaving the product.
-
Non-transparent proxy support is now available for AWS.
-
API v2 now supports Spark 2 and 3 via ML Runtime Addons.
-
Spark Dynamic Allocation Lite is now supported in CML.
-
Job lineage/dependencies are now maintained when forking a project.
-
CML Admins can now create CML Teams that have their membership synchronized with CDP groups.
-
Users who no longer have access to a Workspace are deactivated within CML after a user sync.
-
Unified diagnostics are now integrated in CML support bundles.
Management Console
Fine-grained Access Control for ADLS Gen2 and S3
The fine-grained access control for ADLS Gen2 and S3 cloud storage via the Ranger Authorization Service (RAZ) enables Amazon S3 and ADLS Gen2 users to control access per user and per directory in cloud storage. By specifying Apache Ranger policies for cloud storage, admins can provide home directories and audit capabilities similar to those used with HDFS files in an on-premises or IaaS deployment.
For more information and setup steps refer to:
Operational Database
Cloudera Operational Database (COD) supports ephemeral storage on Azure
COD now supports the configuration of instance storage to cache HBase data stored in block storage. This is only available on AMD instance types. COD supports 1.6TB NVMe (Non-volatile Memory Express) based cache that significantly improves the performance when you deploy COD with object storage.