Fixed issues in Cloudera Data Warehouse on Public Cloud
Review the issues fixed in this release of the Cloudera Data Warehouse (CDW) service on CDP Public Cloud.
- CDPD-74640: Improved query consistency and data writing for Beeline and Hive queries
- In concurrent workflows using Beeline, queries occasionally
returned incorrect results due to non-thread-safe file handling, especially when
hive.query.result.cached.enabled was disabled. Additionally,
INSERT OVERWRITE DIRECTORY
operations failed to write data correctly to specified directories when query result caching was enabled. - CDPD-72985: Compatibility issue in HMS thrift struct for Hive column stats
- Hive4 introduced a new field "engine" to differentiate the stats generated by different engines and this was a required field. This breaks the compatibility with clients that are already using Hive 3 or engines that are using customized thrift api like TrinoDB.
- CDPD-71484: Improve LLAP performance by reusing FileSystem objects across tasks
- Frequent closure of FileSystem objects disabled Hadoop's FileSystem cache, reducing LLAP efficiency.
- CDPD-74017: Schema resolution does not work for migrated partitioned Iceberg tables with complex data types
- Schema resolution does not work correctly for migrated partitioned
Iceberg tables that have complex data types. This fix addresses the field ID generation by
taking the number of partitions into account. If none of the partition columns are
included in the data file (common scenario), file-level field IDs are adjusted
accordingly. You could also come across a scenario where all the partition columns are
included in the data files.
However, if some partition columns are included in the data file while other partition columns are not, an error is generated.
Apache Jira: IMPALA-13364
- DWX-18975: Configure statestored's memory based on executor group size for Impala Virtual Warehouse with workload-aware autoscaling
- For smaller executor group sizes statestored's memory is configured to 1GB while for larger executor group sizes it is configured to 2GB to handle updates from more executors.
- DWX-18701: Increasing Catalogd's JVM's Xmx does not increase container's memory limit
- Increasing the JVM's
Xmx
for the Catalogd did not lead to a corresponding increase in the container's memory request and limit, leading to a mismatch between the configuration and actual memory usage. Adjusted the configuration process to ensure that when JVM'sXmx
memory is increased, the container's memory request and limits are updated accordingly. - CDPD-73442: Resolution of potential deadlock
- This fix addresses a deadlock issue in long-running sessions with
an active
idle_query_timeout
, which caused new queries to hang and prevented existing queries from expiring.Apache Jira: IMPALA-13313
- CDPD-73187: Impala Ranger audit plugin fails to create audit logs
- The fix ensures that the Ranger plug-in in Hive and Impala send audit events to the Solr service after upgrading Data Lake to a version that requires SSL for Ranger's audit events.
- DWX-18050: Ranger audit logs show origin client's IP address for Impala Virtual Warehouse
- Ranger audit logs now show the origin client's IP address for
Impala Virtual Warehouse when Impala coordinator's flagfile config has
use_xff_address_as_origin=true
. This applies to all Impala clients such as impala-shell, impyla, jdbc and odbc clients. - DWX-19110: Executor deployment issue on workload-aware autoscaling VW creation
- Workload-aware autoscaling Impala Virtual Warehouses deployed only one executor for the small group, ignoring the configured group size. The StatefulSet for this group also has a replica count of 1, which does not reflect the expected configuration.
- DWX-19309: Prometheus pods fail due to unencrypted EBS volumes
- Prometheus pods switched to EBS volumes failed to start because the default ebs-storageclass did not create encrypted volumes. This caused authorization errors, preventing the pods from running and impacting Grafana, Impala autoscaling, and Hue functionality.
- DWX-18448: Impala Virtual Warehouse size changes during updates
- When using the
update-vw
command in the CDP CLI, the--template
flag is required to specify the Virtual Warehouse size. - DWX-16875: Improved memory management for data publishing to Observability dashboard
- Data from Virtual Warehouses was intermittently not updating on
the Observability dashboard. Restarting the
databus-producer
deployment temporarily resolved the issue. The pod was being abruptly killed due to memory limits being exceeded, and the JVM settings were not optimized for dynamic resource allocation. - DWX-18932: Incorrect High Availability mode displayed for Impala Virtual Warehouses
- When creating an Impala Virtual Warehouse (VW) in ACTIVE_PASSIVE High Availability (HA) mode with coordinator auto-shutdown enabled, the CLI incorrectly displayed the HA mode as ACTIVE_ACTIVE.
- DWX-19172: Upgrade Cluster Autoscaler to version 1.30.2
- The Kubernetes Cluster Autoscaler version is upgraded to 1.30.2 with the corresponding 9.37.0 chart version to ensure support for Amazon Elastic Kubernetes Service (EKS) 1.30.