CDP Public Cloud: February 2023 Release Summary

Data Hub

This release of the Data Hub service introduces new features and behaviors:

Data Hub Autoscaling

Autoscaling is a feature that adjusts the capacity of cluster nodes running YARN by automatically increasing or decreasing, or suspending and resuming, the nodes in a host group. You can enable autoscaling based either on a schedule that you define, or the real-time demands of your workloads. For more information see Autoscaling clusters.

DataFlow

This release (2.3.0-h3-b4) of Cloudera DataFlow (CDF) on CDP Public Cloud addresses various bug fixes and improvements in Flow Designer, private cluster support, upgrades and general stability.

Fixed issues

Stability Improvements

  • CDPDFX-6591 - Added liveness probe to dfx-local to influence a restart if the container becomes unhealthy.
  • CDPDFX-6701 - Fixed a resource leak that would eventually lead to degraded behavior of the application like interruption of workload communication (through heartbeats) to the control plane or the Flow Designer not displaying feedback for running test sessions and preventing processors from being started.

Flow Designer

  • CDPDFX-6641 - Updated the redis client in the workload to use a connection pool that was previously aggressively allocating new threads.
  • CDPDFX-6635 - Fixed an issue with Flow Designer UI that would continually attempt to reconnect/resubscribe even when their authentication expires.
  • CDPDFX-6636 - Fixed an issue with Flow Designer test session event blocking due to infinite event retry.
  • CDPDFX-6632 - Ensure additional access rights are enforced when using Flow Designer.
  • CDPDFX-6695 - Fixed issues in the Flow Designer that prevented applying Parameter changes when an asset was removed.
  • CDPDFX-6695 - Fixed an issue in the Flow Designer where editing a parameter with multiple assets does not trigger a message to apply changes.

Private Cluster Support

  • CDPDFX-6740 - Fixed an issue causing AKS private clusters to fail when waiting for load balancer IP via cluster proxy service.

Other Issues

  • CDPDFX-6731 - Fixed an issue with CDF 2.1.0-b123 or older not being able to perform workload access certificate renewal.
  • CDPDFX-6634 - Fixed issue that would result in the user being redirected to the wrong URL following successful authentication on the workload.
  • CDPDFX-6655 - Fixed issue that would load the app switcher in the sidebar in the Deployment Manager and Flow Designer with the incorrect options.
  • CDPDFX-6567 - Fixed an issue with logging in AWS AP region 3 (Jakarta).
  • CDPDFX-6693 - Fixed navigation when using the Home button from the Manage Deployment page.
  • CDPDFX-6713 - Fixed DataFlow configuration update failures when attempting to add new authorized IP ranges without changing minimum or maximum cluster capacity.

Behavioral changes

If you defined a custom NAR storage location during flow deployment in earlier releases, the setting got stored as the default value for that environment. CDF 2.3.0-h3 changes this behavior: custom NAR configuration does not get stored for the environment any more. This means every time you deploy a flow that requires custom NARs, you need to configure the access credentials and the storage location.

Data Visualization

Cloudera Data Visualization (CDV) 7.1.0 was released on February 3, 2023 introducing the following changes.

New features and improvements in Cloudera Data Visualization 7.1.0

  • VIZ-499 Data Extracts are now available for accelerated visualization. Data Extracts are a tool for managing the analytical capabilities, performance, concurrency and security of data access across data connections.
  • VIZ-600 A new notification center allows you to quickly view application errors. For admin users and users with the “View activity logs’’ privilege, webserver errors (such as issues with applying advanced site settings) are available in this new notification center. All users can view browser-level errors that might be affecting CDV.
  • VIZ-1728 You can now create datasets or add data to a data connection directly from the homepage.
  • VIZ-1670 A new “View Users and Roles” privilege can now be set to allow users with Create Workspace permissions to search the user database when adding new members to a workspace.
  • VIZ-1723 Percentage range can be specified as a custom color for measure value ranges.
  • VIZ-1859 Functionality added to filter Job Logs by job creator as well as job type and job status.
  • VIZ-1772 Performance improvement for dashboard loading in View mode.
  • VIZ-1397 UI improvements to the default CDV Homepage.

Fixed issues in Cloudera Data Visualization 7.1.0

  • VIZ-792 Dataset-defined color settings should now work as expected when adding a field to the Colors shelf in the Visual Builder interface.
  • VIZ-1269 URL parameters save and clear as expected when navigating between dashboards and tabs within an application.
  • VIZ-1457 Temporary dashboard errors related to query cancellation are now reset after requests complete.
  • VIZ-1505 Duplicate query rows no longer appear in the Show Usage Info > Queries modal.
  • VIZ-1512 Changing the SQL for a SQL-based dataset refreshes the result cache for the dataset.
  • VIZ-1540, VIZ-1664, VIZ-1665, VIZ-1754, VIZ-1826, VIZ-1833 Fixed critical and high package vulnerability findings in CDV 7.0.5.
  • VIZ-1710 Fixed a bug where applying a dataset switch to all visuals on the sheet did not work as expected.
  • VIZ-1726 Fix for linked visuals to avoid visual cloning.
  • VIZ-1799 Data Connection types that support data refresh from Connection Explorer now see refresh alongside other connection options.
  • VIZ-1810 Fix for setting query timeout maximum in cases where customized values were previously set in advanced site settings.
  • VIZ-1817 Fixed a bug where the bars in a trellised bar chart could overlap each other when multiple filters were selected.
  • VIZ-1845 Fixed issue where null binary characters could cause email jobs with XSLX file attachments to fail with a snapshot generation error.
  • VIZ-1847 Visual title edits on dashboards with multiple sheets should now save as expected.
  • VIZ-1861 Changing column position in a table visual now works as expected.
  • VIZ-1876, VIZ-1896 Fixed a bug where a filter on a column name that contained a space or special character could return a syntax error in some instances. Fixed in CDV 7.1.0-b3.

Data Warehouse

This release of the Cloudera Data Warehouse (CDW) service on CDP Public Cloud introduces these changes.

AWS Elastic Kubernetes Service (EKS) version upgrade

The CDW application uses Kubernetes (K8S) clusters to deploy and manage Hive and Impala in the cloud. Kubernetes versions are updated every 3 months on average. When the version is updated, minor versions are deprecated.

To avoid compatibility issues between CDW and AWS resources, you must upgrade the version of Kubernetes that supports your existing CDW clusters to version 1.21.

AWS environments you activate using this release of Cloudera Data Warehouse will use version 1.22.

Important: Check to make sure your AWS environment has been migrated from Helm 2 to Helm 3 before you begin upgrading the Kubernetes version. If a MIGRATE icon appears in the upper right corner of the environment tile, your AWS environment has not been migrated from Helm 2 to Helm 3. Update the Helm package manager for your environment before you attempt to upgrade its Kubernetes version.

For information about upgrading the AWS EKS Kubernetes version, see Upgrade CDW for EKS upgrade.

Support for Amazon Elastic Kubernetes Service 1.22 cluster

This release 1.6.1-b258 (released Feb 7, 2023) automatically uses and provisions Amazon Elastic Kubernetes Service (EKS) 1.22 when you activate an environment from CDW. In this release, upgrading a cluster to EKS 1.22 is not supported.

Changes to the managed policy ARN, standard IAM permissions, and restricted permissions policy

In this release, as an AWS environment user, you must update the managed policy ARN to handle Kubernetes CSI drivers for EBS and EFS. You must also update your standard IAM permissions, and the restricted permissions policy if you use it.

Make the following changes:

  • Restricted policy changes for updating a managed policy ARN Add “arn:aws:iam:::policy/" as shown in ["Attaching a managed policy ARN"](/data-warehouse/cloud/aws-environments/topics/dw-aws-attach-role-policy.html).

  • Standard JSON IAM permissions policy template Add the following line: “elasticfilesystem:PutFileSystemPolicy”, as shown in “Standard JSON IAM permissions policy template”.

  • Restricted permissions policy Add “elasticfilesystem:PutFileSystemPolicy”, to the ResourceTag object and move the “elasticfilesystem:CreateFileSystem”, from the CloudFormation object to the RequestTag object, as shown in AWS restricted permissions policy.

Apache Iceberg GA in CDW

This release introduces the general availability of ACID transactions with Iceberg v2 tables from Hive in CDW Runtime 2023.0.13.0-20 (released 2023-2-7)). CDW Runtime 2022.0.12.0-90 (released 2022-12-13), introduced the general availability of ACID transactions with Iceberg V2 tables from Impala. You can run Apache Iceberg ACID transactions within some of the key data services in the Cloudera Data Platform (CDP) public cloud (AWS and Azure), including Cloudera Data Warehouse. From Hive or Impala, you use Apache Iceberg features in CDW, which include time travel, create table as select, and schema and partition evolution.

To access these features, create a new Virtual Warehouse or upgrade an existing one.

Support for Iceberg tables in Avro (Preview)

In this release, you can read Iceberg tables in Avro from Impala. There is a related known issue with using the DECIMAL data type in Avro this release.

Reading Iceberg tables in Avro format from Impala is available as a technical preview. Cloudera recommends that you use this feature in test and development environments. It is not recommended for production deployments.

Enhanced Iceberg support for materialized views

In this release, you can create a materialized view of an Iceberg V1 or V2 table based on an existing Hive table or an Iceberg table. Automatic rewriting of the materialized view occurs under certain conditions.

Iceberg load data inpath feature

From Impala, you can now load data into an Iceberg table using the load data inpath feature.

Querying Data Hub Kudu tables from an Impala Virtual Warehouse using Kudu

After configuring an Impala Virtual Warehouse to connect to Kudu, you can create Kudu tables using Impala clients, such as Hue, the Impala shell, or JDBC/ODBC. You can also ingest data using Spark/NiFi and query using Impala.

Flexible allow lists for Kubernetes cluster and load balancer

In previous releases, there was a single allow list for IP Classless Inter-Domain Routing (CIDRs) from which the Kubernetes cluster or the load balancer should accept incoming connections. In this release, you configure two separate allow lists for accepting incoming connections:

  • Enable IP CIDR for Kubernetes cluster
  • Enable IP CIDRs for the load balancer

Impala enhancements

This release of Cloudera Data Warehouse includes the following new Impala features:

  • Binary support: Impala now supports BINARY columns for all table formats except Kudu. See the BINARY support topic for more information on using this arbitrary-length byte array data type in CREATE TABLE and SELECT statements.
  • ALTER VIEW support: Before this release, altering only the VIEW definition, VIEW name, and owner was supported. Impala now supports altering the table properties of a VIEW by using the set tblproperties clause.
  • Push down date literals to Kudu scanner: Impala now allows creating and pushing down Kudu predicates from the DATE type. Example: select * from functional_kudu.date_tbl where date_col = DATE “1970-01-01”;

          PLAN-ROOT SINK
          |
          00:SCAN KUDU [functional_kudu.date_tbl]
          kudu predicates: date_col = DATE '1970-01-01'
          row-size=12B cardinality=1
          ---- DISTRIBUTEDPLAN
          PLAN-ROOT SINK
          |
          01:EXCHANGE [UNPARTITIONED]
          |
          00:SCAN KUDU [functional_kudu.date_tbl]
          kudu predicates: date_col = DATE '1970-01-01'
          row-size=12B cardinality=1
    
  • Fix untracked memory in KRPC: Improved memory estimation of queries by accounting for untracked memory in KrpcDataStreamSender.
  • Redhat UBI8 images: To address multiple CVEs, Impala images are built using UBI8 base image.

This release (February 14, 2023) of the Cloudera Data Warehouse (CDW) service on CDP Public Cloud introduces these changes:

AWS Kubernetes version upgrade

With this hot fix, the broken Upgrade button for upgrading an existing cluster to EKS 1.2.1 is now fixed. For more information about upgrading, see What’s New, February 7, 2023.

Machine Learning

Major features and updates for the Cloudera Machine Learning data service.

February 14, 2023

Release notes and fixed issues for version 2.0.36-H2.

Fixed Issues

  • Model Registry (DSE-25331) - Fixed an issue where users may see an error message x509: Certificate signed by unknown authority on the Model Registry page in the CML workspace application. Note that if you are seeing this error, or if you create a model registry after you create one or more CML workspaces in an environment, you must synchronize the model registry with the workspaces. See the Synchronizing the model registry with a workspace section of the Model Registry Tech Preview documentation for more details.

February 10, 2022

Release notes and fixed issues for version 2.0.36-H1.

New Features / Improvements

  • Model Registry (Tech Preview) - The Model Registry is a new CML Product to store and manage machine learning models and associated metadata, such as the model’s version, dependencies, and performance. The registry enables MLOps and facilitates the development, deployment, and maintenance of machine learning models in a production environment. To get access to the Tech Preview version of the product see Model Registry Tech Preview documentation for details. Note: Model Registry is not supported with R models.

February 7, 2022

Release notes and fixed issues for version 2.0.36.

New Features / Improvements

  • Custom Runtime Addons - This feature enables administrators to mount shared dependencies like connection drivers or configuration files to all CML workloads. You can start by following the documentation.
  • Cancellable CML Workspace Backup - Administrators now can cancel in-progress backups and resume the CML Workspace.
  • AWS Support - The AWS Jakarta and Hong Kong regions are now supported.
  • Kubernetes - EKS 1.23 (AWS) and AKS 1.24 (Azure) are now supported.

Fixed Issues

  • Applications (DSE-22857) - Fixed an issue where unauthenticated users could access the terminal session in a running public application if the application was configured to allow anonymous access to the terminal endpoint. Now anonymous access to the terminal endpoint is redirected to an authentication page.
  • Applications (DSE-24105) - Fixed an issue where the Application List page may not load correctly when there are a large amount of Applications accessible by the user.