CDP Public Cloud: November 2021 Release Summary

Data Engineering

Non-transparent proxy support

CDE supports deploying into CDP environments using a non-transparent proxy. The proxy is registered and enabled during CDE environment creation. The proxy configuration is automatically added to the deployed CDE service and virtual clusters (VCs).

UI support for Python virtual environments

You can now create custom Python resources on the CDE UI, including virtual environments (venvs). These custom venvs are selectable in the job creation wizard when creating PySpark jobs.

Support for Airflow core operators

With Airflow 2, Cloudera now supports all core operators.

Support for Ranger Authorization Service

CDE now supports Ranger Authorization Service (RAZ) in AWS and Azure environments.

Data Visualization

New CML engine

New features and improvements

  • Checking dataset compatibility when importing a dataset from SQL is now possible. Including or excluding a data point from a visual now shows a confirmation message prior to visual refresh.
  • Several improvements have been implemented for the Timeline visual type, including adding default x-axis formatting, easier display format selection, and the option to add a legend.
  • You can now use the EDIT button on the dataset field tile to streamline the dataset editing process.
  • When configuring custom site settings, errors encountered in the startup process are now logged and bypassed instead of blocking the startup, to allow for easier debugging. Configuring a visual’s click behavior is now easier with new context menu options available in build mode.
  • Job logs can now be filtered by job type and status.

Data Warehouse

Supported AWS regions now include eu-south-1 (Milan)

Previously in a Technical Preview state, support for the EU (Milan) region in AWS is in a GA (general availability) state. For more information, see Supported AWS Regions.

SSM and CBO federation

This release adds support for interoperation between the cost-based optimizer (CBO) in Cloudera Data Warehouse and the AWS Systems Manager (SSM) service.

Ability to set idle session timeout for Hue

CDW allows you to configure the idle session timeout value in Hue to automatically log out users from the Hue web interface after a period of inactivity. For more information, see Configuring idle session timeout for Hue.

CDW Impala/Hive cross-region connection

This release adds the capability to create external tables on data in S3 buckets that are hosted in a different region from the one where CDP is running. Requests to these buckets can use a proxy. For example, GDPR dictates that you cannot move data from your EMEA region, but your customers want to query the data from CDP in us-east-1. In compliance with GDPR, you can create an external table based on data in a bucket in an EMEA region when you are running CDP in the us-east-1 region.

Hue usability improvements

As Admin in a Ranger Authorization Service (RAZ) environment, you can set the AWS S3 or ADLS path to a user directory in Hue. As the Hue user, you always land in your directory and do not have to repeatedly change URLs.

Hive backports

This release fixes issues with custom compaction queue settings and vectorized built-in functions having compound expressions in PARTITION BY or ORDER BY clause.

Management Console

Cluster Connectivity Manager v2 (CCMv2)

CCMv2 replaces CCMv1. While CCMv1 establishes and uses a tunnel based on the SSH protocol, with CCMv2 the connection is via HTTP. If your CDP tenant has been granted the CDP_CCM_V2 entitlement, all new environments created with Runtime 7.2.6 or newer after enabling CCMv2 on your tenant use CCMv2. Existing environments and new environments created with Runtime older than 7.2.6 continue to use CCMv1. All newly registered classic clusters use CCMv2, but previously registered classic clusters continue to use CCMv1. If your CDP tenant has not been granted the CDP_CCM_V2 entitlement yet, it continues to use CCMv1 in all cases.

The steps to register an environment with CCMv2 are similar to CCMv1 configuration steps. The main differences are:

  • If you are deploying in an environment with restricted outbound network access, port 443 needs to be open and new destinations need to be added to the allow list.
  • If you are registering a classic cluster, the steps have changed.

Note: Your CDP tenant only uses CCMv2 if it has been granted the CDP_CCM_V2 entitlement.

For more information, see Cluster Connectivity Manager.

Updated Azure provisioning credential’s permissions

The following new Azure permissions are required for the CDP provisioning credential:

"Microsoft.Network/loadBalancers/backendAddressPools/join/action", "Microsoft.Network/loadBalancers/delete", "Microsoft.Network/loadBalancers/read", "Microsoft.Network/loadBalancers/write,"

If you have created a custom role for the CDP provisioning credential, you should update your application registration on Azure, assigning these additional permissions. If you have assigned the built-in Contributor role instead of granular permissions, you do not need to take any action. Documentation has been updated. See Prerequisites for the provisioning credential.

FreeIPA HA for GCP environments

FreeIPA HA is now supported and used by default for all newly created GCP environments.

No-proxy option for non-transparent proxies

When you set up a non-transparent proxy server, you now have the option of configuring specific IP addresses, domains, or subdomains to bypass the proxy. For more information, see Using a non-transparent proxy.

CDW diagnostics collection

You can trigger a diagnostics bundle collection for Cloudera Data Warehouse (CDW). See updated Send a diagnostic bundle to Cloudera Support.

Updated GCP provisioning credential’s permissions

A new GCP granular permission is required for creating Data Hubs using the Data Engineering HA template: compute.regionHealthChecks.useReadOnly. If your GCP provisioning credential uses a custom IAM role with granular permissions, you should update it to include this permission. See updated Service account for the provisioning credential.

Operational Database

Cloudera Operational Database (COD) is available as a Technical Preview feature on GCP

You can now deploy COD on GCP easily similar to what is available for Amazon Web Services (AWS) and Microsoft Azure.

COD automatically improves the performance by 80% when you use AWS S3

COD now delivers a better performance in S3 because the data loading behaviour from S3 into cache is tuned. This improvement minimizes the cost associated with the S3’s high latency to read data.

COD improvises scalability when using block storage on AWS

COD now uses larger EBS volumes for the underlying master nodes to provide better scalability.