CDP Public Cloud: June 2022 Release Summary

Data Engineering

This release (1.15-h1) of the Cloudera Data Engineering (CDE) service on CDP Public Cloud has certified support for Apache Iceberg v0.13.

GA of Apache Iceberg

You can use Cloudera Data Engineering virtual clusters running Spark 3 to interact with the latest version (0.13) of Apache Iceberg tables. CDE supports row level updates via copy-on-write MERGE / UPDATES/ DELETES operations. Copy-on-write is helpful in bulk updates in read heavy use-cases. Compaction is also supported using Spark Iceberg APIs. As support for Atlas lineage is still in progress, users should set the following Spark property in their jobs: spark.lineage.enabled=false. For more information, see Using Apache Iceberg in Cloudera Data Engineering.

Data Hub

This release of the Data Hub service introduces new features.

Customer managed encryption keys on AWS

Previously, Data Hub EBS encryption could be configured only as part of Data Hub creation. Now you can also configure encryption during environment registration and all EBS volumes and RDS database and used by the Data Hubs running in that environment will be encrypted. It is still possible to add a CMK during Data Hub creation, overwriting the encryption key provided during environment registration for that Data Hub. For more information, refer to Adding a customer managed encryption key to a CDP environment running on AWS and Configuring encryption for Data Hub’s EBS volumes on AWS.

Cluster template overrides

As an alternative to creating a custom template, you can specify custom configurations that override or append the properties (service and role configs) in a built-in or custom Data Hub template. These configurations are saved as a shared resource called “cluster template overrides,” and can be used and re-used across Data Hub clusters in different environments. For more information, see Cluster template overrides.

Creating multi-AZ Data Hubs on AWS

By default, CDP provisions Data Hubs in a single AWS availability zone (AZ), but you can optionally choose to deploy them across multiple availability zones (multi-AZ). For more information, see Creating a multi-AZ Data Hub on AWS.

Customer managed encryption keys on GCP

By default, a Google-managed encryption key is used to encrypt disks and Cloud SQL instances used by Data Hub clusters, but you can optionally configure CDP to use a customer-managed encryption key (CMEK) instead. This can only be configured during CDP environment registration using CDP CLI. For more information, refer to Adding a customer managed encryption key for GCP.

Data Visualization

The following new features and improvements have been introduced in Cloudera Data Visualization 7.0.1:

  • Data connections to Snowflake are now available as a technical preview feature.
  • Introduced a per-user concurrency setting in data connections that allow users to limit the number of queries a single user can occupy. This prevents one user from blocking connection access for an entire connection.
  • Added a Current Load panel to the Profile Data modal that allows users to view and manage ongoing queries.
  • Added dataset switching to filters, so users can quickly change the filter across datasets and connections. Users can now also quickly change the dataset for all filters and visuals in a sheet.
  • The dashboard layout interface now includes names and images of relevant visuals for easier visual identification when moving and resizing visuals.
  • When editing a Dashboard’s style in the Dashboard Designer modal, it is now possible to reset all styles to original.

Data Warehouse

This release of the Cloudera Data Warehouse (CDW) service on CDP Public Cloud introduces these changes.

Apache Iceberg GA in CDW

Apache Iceberg is now generally available (GA) in key data services on CDP, including Cloudera Data Warehouse. Iceberg integration with Cloudera Data Platform (CDP) enhances the Lakehouse architecture by extending multifunction analytics to petabyte scale for multi-cloud and hybrid use-cases. From Impala, you use Apache Iceberg features in CDW, which include time travel, create table as select, and schema and partition evolution. To access these features, create a new Virtual Warehouse or upgrade an existing one.

Cross account role permissions policy

For security reasons, if you do not want to provide PutRolePolicy permission in your cross account role, which would be used later to add an inline policy to the Node instance role, you can attach a managed policy ARN to a node role. This managed policy provides cross account role permissions. To access this feature, create a new NodeInstanceRole manually, and provide the ARN during activation of the environment from CDW.

Zipping unnest on arrays from Views

As part of this release, you can use zipping unnest functionality on arrays from Views. Before this release, this zipping functionality worked for arrays only in Tables but did not support Views as a source. For more information about using this zipping unnest functionality, see Zipping unnest on arrays from Views.

Updates to the Hive and Impala language reference manuals in Hue

Hive and Impala language reference manuals that are built into the Hue web interface have been updated to include the latest syntax and user-defined functions (UDF). Help on Iceberg concepts and syntax is available for Impala. Improvements have also been made to the query autocomplete predictions and syntax checker to support these changes.

DataFlow

This release (2.1.0-b123) of Cloudera DataFlow (CDF) on CDP Public Cloud introduces the following changes:

  • CDF environment and deployment upgrades are now supported (Preview). CDF service upgrades require a specific entitlement that is granted by request only. Reach out to your Cloudera Account Representative or open a support ticket to request CDF service upgrades. For more information, see the Upgrade documentation.
  • The create-deployment CLI command now supports Inbound Connections.
  • CDF now correctly configures the deployment when it detects any of the two reserved Controller Services Default NiFi SSL Context Service and Inbound SSL Context Service regardless of whether they are external references or not.
  • Kubernetes Version 1.22 is now the default Kubernetes version for CDF. When CDF is enabled, it creates AWS EKS and Azure AKS clusters based on version 1.22.
  • CDF now supports Data Lake scaling. After the Data Lake scale operation has completed, affected deployments will show an active alert and need to be manually restarted to put the new configuration in effect.
  • The Event Hub to ADLS ReadyFlow now allows you to specify an Event Hub Service Bus Endpoint.

Management Console

This release of the Management Console service introduces the following changes:

New “Advanced Options” section in environment registration wizard

The environment registration UI now features a new “Advanced Options” section on some of the pages, which includes some options that were previously featured in the main UI sections. The options that have been moved to the “Advanced Options” sections include:

  • On the Data Access and Data Lake Scaling page:
    • Multi-AZ configuration for Data Lake and FreeIPA (available for AWS only)
    • Recipe selection for Data Lake

More options will be added to the “Advanced Options” in the future.

New option to delete attached volumes during Data Lake repair

When you initiate a repair from the Data Lake Hardware tab, you have the option to delete any volumes attached to the instance. For more information see Performing manual Data Lake repair.

Public Endpoint Access Gateway for Azure

During Azure environment registration, you can optionally enable Public Endpoint Access Gateway, which provides secure connectivity to UIs and APIs in Data Lake and Data Hub clusters deployed using private networking, allowing users to access these resources without complex changes to their networking or creating direct connections to cloud provider networks. With this release, Public Endpoint Access Gateway is general availability for AWS and Azure, and it remains preview for GCP. See Public Endpoint Access Gateway.

Generate workload username based on email

By default, workload usernames are generated using the identity provider user ID. Alternatively, you can now generate workload usernames based on users’ email addresses. This is useful in cases when the identity provider user ID is an opaque ID, like a uuid or employee ID, which gives equally opaque workload usernames. For more information, see Generating workload usernames based on email.

AWS Jakarta region

You can now register a CDP environment and create Data Hubs in the AWS Jakarta Region (ap-southeast-3). See updated Supported AWS regions.

Support for Operational Database in ap-1 and eu-1 regional Control Planes\

Cloudera Operational Database is now supported in the ap-1 (Australia) and eu-1 (Germany) regional Control Planes.

Restricting all Cloudera SSO access

For added security, you can now restrict all Cloudera SSO access (including account administrator access) by contacting Cloudera Support and they can disable or enable the “Cloudera SSO All Login Enabled” setting for the account. Previously, account administrator access could not be restricted. For more information, see Disabling the Cloudera SSO login.

Customer managed encryption keys on GCP

By default, a Google-managed encryption key is used to encrypt disks and Cloud SQL instances in Data Lake, FreeIPA, and Data Hub clusters, but you can optionally configure CDP to use a customer-managed encryption key (CMEK) instead. This can only be configured using CDP CLI. There is no UI option available for specifying a GCP CMEK in CDP. For more information, refer to Adding a customer managed encryption key for GCP.

Customer managed encryption keys on AWS

By default, Data Lake and FreeIPA’s Amazon Elastic Block Store (EBS) volumes and Relational Database Service (RDS) are encrypted using a default key from Amazon’s KMS, but you can optionally configure encryption using Customer Managed Keys (CMK). Data Hubs inherit environment’s encryption key by default but you have an option to specify a different CMK during Data Hub creation. For more information, refer to Adding a customer managed encryption key to a CDP environment running on AWS.

Deploying CDP in multiple AWS availability zones By default, CDP provisions Data Lake, FreeIPA, and Data Hubs in a single AWS availability zone (AZ), but you can optionally choose to deploy them across multiple availability zones (multi-AZ). It is possible to enable it either for all or some of these components. For more information, refer to Deploying CDP in multiple AWS availability zones.

SCIM for Azure AD

CDP supports SCIM with Microsoft Azure Active Directory (Azure AD). For more information, see Configure SCIM with Azure AD.

Operational Database

Cloudera Operational Database (COD) 1.22 version introduces the following features:

Store File Tracking (SFT) as a general availability feature

Store File Tracking (SFT) defines a separate, pluggable layer to handle storefile lifecycle, and includes the FILE based built-in implementation that avoids internal file rename or move operations while managing the storefiles. This is a critical enablement to deploy HBase over S3 object store, which is known for the lack of atomic renames. COD enables this feature by default for databases deployed on AWS with S3, to mitigate the aforementioned S3 limitation that could cause critical issues for HBase. For more information, see HBase Store File Tracking.

Multiple Availability Zones (Multi-AZ) on AWS

COD ensures high availability and fault tolerance using Multi-AZ deployments. A Multi-AZ deployment means that compute infrastructure for HBase’s master and region servers are distributed across multiple AZs ensuring that when a single availability zone has an outage, only a portion of Region Servers is impacted and clients automatically switch over to the remaining servers in the available AZs. Multi-AZ for COD is currently supported only on Amazon Web Services (AWS) environments. For more information, see Multi-AZ deployment on COD.

Support for COD in ap-1 and eu-1 CDP Control Planes

COD now supports eu-1 (Germany) and ap-1 (Australia) regional CDP Control Planes.