Behavior changes

This release of the Cloudera Data Warehouse (CDW) service on CDP Public Cloud has the following behavior changes:

Summary: Optimized ephemeral storage for Impala's Catalogd

Ephemeral storage for Catalogd is now optimized for shared nodes, automatically set to the lesser of 24GB or twice the Java heap (Xmx) plus a 1GB buffer.

Before this release: The storage limit of 512MB often caused container evictions due to insufficient space, especially during JVM heap dumps.

After this release: The ephemeral storage limit for Catalogd is now set dynamically based on the JVM heap size (Xmx), calculated as
2 * Xmx + Buffer space
This approach ensures adequate storage for heap dumps and minimizes the risk of container eviction, resulting in improved stability for Catalogd and other services sharing node resources.

Summary: Optimized memory management for databus-producer deployment

Memory management for the databus-producer deployment has been optimized to improve stability and ensure consistent data publishing to the Observability dashboard.

Before this release:
  • Data updates to the Observability dashboard stopped intermittently.
  • The pod could be abruptly terminated if memory usage exceeded the requested limit.
  • JVM heap memory (-Xmx) was set to a hardcoded value, requiring manual adjustments during updates.
After this release:
  • Memory requests and limits are now aligned to prevent abrupt pod termination.
  • The JVM heap memory (-Xmx) is dynamically set to 70% of the memory resource limit.
  • Enhanced JVM options allow automatic heap dumps in the event of out-of-memory errors, simplifying debugging.

Summary: Deterministic IDs for environments, Database Catalog, and Virtual Warehouses

Before this release: Every time you reactivated an Environment, the Environment ID changed and this led to a change in the path of the logs. The log folders and diagnostic bundle paths were written to the root folder of the storage instead of the configured paths.

After this release: Only the configured paths that are added during the Data Lake creation and prefixed by /tmp are used by the service. Also, the logs are prefixed with the Environment and Data Base Catalog name (wherever applicable) instead of the IDs to simplify manual searching through the logs. The base path of the log folder is also changed and users do not have to go to the external/sys.db/hive folders.

The various log files are still prefixed with the entity IDs in storage and will continue to remain in use to simplify the debugging process but navigating to the entity logs is now made easier.

Summary: Changes to the supported Azure instance types

Before this release: You could only select an instance while activating a Data Warehouse Environment, and the Data Warehouse Control Plane created the node pool from this specific instance. There was no option to select an instance type for a Virtual Warehouse.

After this release: You can select an instance during Environment activation (only through CDP CLI) and Virtual Warehouse creation (through UI and CDP CLI). The Data Warehouse Control Plane requires the following instance types to be allowed, if they are available, in your Azure region to support instance type selection while creating a Virtual Warehouse:

  • Standard_E16_v3
  • Standard_E16ds_v4
  • Standard_E16ads_v5
  • Standard_E16ds_v5
  • Standard_E16pds_v5

The Data Warehouse Control Plane creates node pools using all these instance types. If you get an error such as the following while activating a Data Warehouse Environment, then increase the quota of the problematic Azure instance type:

The VM size of Standard_E16ads_v5 is not allowed in your subscription in location 'eastus'.

You may request that your infrastructure team allow the Azure instance type by asking them to increase the resource quota. A minimal increase in the resource quota is sufficient for the instance to be allowed.

Summary: Simplified CDW Diagnostic Bundle Download Process

The diagnostic bundle download process in CDW has been simplified for an improved user experience.

Before this release: Users had to select specific information types within time intervals or choose a custom time interval. Additionally, they needed to manually adjust options in "Collect For" to include or exclude types of logs for the bundle.

After this release: Users now directly access a simplified “Collect” option, eliminating the need for additional time interval and log selection adjustments.