Fixed issues in 7.1.9 SP1 CHF 15

CDPD-62755: Ozone DataNode shares the same port with HDFS DataNode

Previously, the Ozone DataNode client service (hdds.datanode.client.port) and the HDFS DataNode HTTP server (dfs.datanode.http.address) were both configured to use port 9864 by default. This port conflict prevented Ozone and HDFS DataNode services from running simultaneously on the same node, resulting in an Address already in use error. This issue is now fixed, and the Ozone DataNode client port has been changed from 9864 to 19864 to avoid this conflict, restoring compatibility between Ozone and HDFS DataNode services on the same host.

CDPD-69213: Export/Import, Incremental Export: Tags fail to propagate to target child entities when the source entity is deleted

Previously, when creating tables with multiple levels of depth, a tag applied to the first table was propagated along the lineage. However, after dropping the first table and exporting the entire lineage, child tables did not have the propagated tag after import on the target cluster. This occurred because of a task ordering issue in the deferred actions flow: when deferred actions were enabled during import, the add-propagation task was created before the parent entity was deleted, but ran after the deletion. This caused the system to find the edge between the parent entity and the classification in a deleted state, preventing tag propagation to child entities.

The fix adds a check in the tag propagation logic to skip the edge-status validation when an import is in progress and deferred actions are enabled. This ensures that tags are now correctly propagated to child entities even when the source entity is deleted.

Apache Jira: ATLAS-5055

CDPD-96779: Spark ORC table column name does not support special characters

Spark threw an error when creating ORC tables with column names containing special characters, such as $, which was compatible with Hive but not with Spark. This issue is now fixed. Special characters can now be used with Spark.

Apache Jira: SPARK-32889

CDPD-93325: Hive2Main.java can print out passwords

Previously, Hive2Main was printing full Beeline command arguments to standard output, which might unintentionally expose LDAP credentials and the truststore password. This issue is now fixed.

CDPD-77768: Oozie log parameters are not substituted correctly

Previously, Oozie log parameters, for example, {0}, were not substituted when log messages contained apostrophes. The formatting engine treated the text following a single quote as a literal block, preventing parameter replacement. This issue is now fixed. Oozie now automatically escapes single quotes in log messages, ensuring that all parameters are substituted correctly.

CDPD-97100: Zeppelin fails after updating Zookeeper and Hadoop components

Previously, updating Zookeeper and Hadoop components caused Zepplin to fail. This occurred because the updates introduced conflicting versions of the io.netty:netty-transport-native-epoll dependency. The issue is now resolved.

CDPD-94957: Broken lineage on Atlas when calling .cache() or .persist() method

Previously, lineage reporting issues occurred when using the .cache() or .persist() methods on Spark DataFrames, which caused lineage to not be correctly reported into Atlas. This issue is now fixed. The Atlas lineage is no longer wrong in case of cached Dataframes.

CDPD-96543: Missing timeline items for partition delta updaters

Previously, the Catalog Server Operation section in the DDL profile incorrectly displayed that delays in metadata loading were caused by fetching the latest Metastore event ID.

This issue is now fixed by adding the missing timeline items for loading partition names from Metastore to the catalog timeline. These external calls are now accurately tracked to help you find the cause of metadata loading delays.

Apache Jira: IMPALA-14062

CDPD-96540: Unnecessary metadata reloads for Metastore ALTER_TABLE events

Previously, the catalog service incorrectly triggered file metadata reloads for certain trivial Metastore ALTER_TABLE events.

This issue is now fixed by normalizing null and empty StorageDescriptor parameters to avoid unnecessary reloads. The logs are also improved, when non-trival StorageDescriptor changes are detected, to display the actual changes.

Apache Jira: IMPALA-14646

CDPD-96529: Slow metadata operations for wide tables

Previously, when using Ranger authorization, Impala checked column masking policies using the column list. This required loading the table metadata, which triggered unnecessary loading delays, even when running the INVALIDATE or REFRESH command.

This issue is now fixed. Impala does not depend on the column list to check policies. This prevents unnecessary metadata loading and improves the performance of INVALIDATE and REFRESH commands on unloaded tables.

Apache Jira: IMPALA-14703

CDPD-92930: Impala daemon crashes with analytic functions and collections

Previously, running a query with an analytic function on unnested collection columns caused the Impala daemon to crash.

This issue is now fixed by ensuring that data is added to the sorting process only when it is fully available and required.

Apache Jira: IMPALA-13272

CDPD-91994: Stale query IDs in catalog logs

Previously, Catalog logs for getPartialCatalogObject requests displayed incorrect query IDs.

This issue is now fixed by ensuring that each request is associated with its correct query ID. The system automatically clears the identification after the request finishes to prevent stale information from appearing in later logs.

Apache Jira: IMPALA-14494

CDPD-93504: NullPointerException when hive.acid.key.index is missing in ORC files

Previously, a NullPointerException (NPE) error occurred when the hive.acid.key.index property was missing from an Optimized Row Columnar (ORC) file.

This issue is now resolved. The fix ensures that if the metadata is missing, the system sets the minimum and maximum keys to null and performs a full file scan instead of failing.

Apache Jira: HIVE-26147

CDPD-94206: Partition column statistics corruption during concurrent updates

Previously, a concurrency issue existed when multiple processes updated partition column statistics at the same time.

This issue is now resolved. The update process is now synchronized to ensure that statistics remain accurate during simultaneous access on the same partition.

Apache Jira: HIVE-29316

CDPD-94205: Database inconsistency during partition deletion failures

Previously, when a direct SQL processing failure occurred during operations such as dropping partitions, the partial changes made to the database were not reverted before the system attempted an alternative processing method.

This issue is now resolved. The system now sets a savepoint before starting direct SQL processing, ensuring that if a failure occurs, the transaction rolls back to the savepoint before retrying the operation.

Apache Jira: HIVE-26976

CDPD-93503: Incorrect row order after query-based minor compaction

Previously, query-based minor compaction used an incorrect sorting order for rows in its inner query. This resulted in duplicated rows when you ran the compaction after multiple MERGE statements.

This issue is now resolved. The fix aligns the query-based minor compaction sorting order with the MapReduce compaction order by using the transaction ID, bucket property, and row ID.

Apache Jira: HIVE-25258

CDPD-93502: Query-based major compaction failure after multiple merge statements

Previously, query-based major compaction failed with a Wrong sort order error when processing tables with multiple Atomicity, Consistency, Isolation, and Durability (ACID) delta directories, such as those created after multiple MERGE statements.

This issue is now resolved. The validation now correctly verifies the row order based on the original transaction ID, bucket property, and row ID, ensuring compatibility with data written by MapReduce and query-based processes.Apache Jira: HIVE-25257

CDPD-93427: Data loss in insert-only tables during query-based minor compaction

Previously, query-based minor compaction produced empty delta files when a table had both aborted and open transactions.

This issue is now resolved. The fix ensures that the compaction process correctly identifies the range of transactions to include, preventing the creation of empty delta files and subsequent data loss.

Apache Jira: HIVE-29272

CDPD-92365: Unsupported removal of SerDe properties

Previously, you could not remove specific serialization or deserialization (SerDe) properties from an existing Hive table because the UNSET SERDEPROPERTIES clause was not supported.

This issue is now resolved. Hive now supports the UNSET operation for SerDe properties, allowing you to delete specific properties by running the ALTER TABLE <table_name> UNSET SERDEPROPERTIES command.

Apache Jira: HIVE-21952

CDPD-93500: Multiple YARN UI navigation issues when accessed through Knox

Previously, several issues prevented users from properly accessing YARN-related pages through Knox. This issue is now fixed. The following UI elements and API functions now resolve correctly when proxied through Knox:

Tracking URL link for running Flink jobs in the YARN UI and YARN UI v2.
Query parameters in the YARN REST API, such as filtering applications by state.
Spark History Server links on finished jobs page.

These improvements enhance the user experience when managing and monitoring YARN applications, Flink jobs, and Spark jobs through the Knox gateway.

CDPD-92941: Knox Gateway logging configuration cannot be customized through Cloudera Manager

The Knox Gateway Logging Configuration (Safety Valve) advanced configuration snippet did not apply properly to Knox Gateway. This issue is now resolved by allowing individual logger configurations to be set and modified through the Cloudera Manager UI.

COMPX-14637: Incorrect permissions on NodeManager local directories cause container failuers

The NodeManager creates the configured local directories with 755 permissions on startup if the directories did not exist. However, if these permissions were changed after startup or if an administrator created the directories with incorrect permissions before starting YARN, the NodeManager would not reset the permissions, resulting in container failures. This issue is now resolved.

CDPD-89478: Oozie UI displays start and end times incorrectly after upgrade

Previously, the Oozie UI intermittently displayed the start time and end time fields in reverse order. Additionally, the Run button remained inactive until you clicked Save, even if you made no changes. This issue is resolved by correcting the field display order and ensuring that the Run button is active for schedules without requiring an unnecessary save action.

CDPD-92510: Inaccurate table update timestamps in Hue

Previously, the Data last updated on timestamp for Iceberg tables in Hue displayed incorrect information when tables were altered using Spark. This occurred because the timestamp was sourced from the transient_lastDdlTime table property, which does not update correctly during Spark operations. This issue is now fixed by updating the metadata source to use Iceberg snapshot timestamps.

CDPD-94739: Audit logging dependency on debug mode in Hue

Previously, audit logging in Hue only functioned when debug logging was enabled by setting the DJANGO_DEBUG_MODE property to true. This caused audit logging to be disabled by default. This issue is now resolved by ensuring audit events are always captured regardless of the debug mode settings. This change ensures Hue compliance and security requirements by maintaining audit logs in all environments.

CDPD-92857: Export All failures in secured multi-tenant clusters

Previously, the Export All feature in Hue failed in secured multi-tenant clusters that used tenant colocation policies. This resulted in access errors during the export process. This issue is now resolved. The Export All functionality now follows tenant boundaries and cluster security policies. However, other known limitations exist. For more information, see Export All failure in secured multi-tenant clusters

CDPD-95855: TLS truststore initialization with FIPS-compliant crypto providers

Previously, in some environments using FIPS-compliant crypto providers, TLS truststore initialization failed. This occurred because of incompatible handling of empty keystore initialization. This issue is now resolved by updating the truststore initialization logic to ensure compatibility with FIPS-compliant providers. This allows the TLS setup to complete successfully.

CDPD-96618: Race condition when registering or unregistering Exception Handling frames

A race condition in libgcc could occur during concurrent registration and EH frames by the memory manager, leading to instability during multi-threaded code generation. This issue is now resolved by ensuring proper padding so that a null-terminated list of FDEs is always provided, eliminating the race condition.

Apache Jira: KUDU-3545

CDPD-93290: Open Keys summary is not displaying proper data

Previously, the Open Keys summary response was not rendering properly and was not displaying proper data. This issue is now fixed.

CDPD-92660: Recon UI fails to load Disk Usage view with large number of nested directories and keys

Previously, with a large number of entries, the Recon UI Disk Usage page failed to load or took an excessive amount of time to populate the tree. This issue is now fixed now. The Disk Usage page no longer loads the root path by default. You also get a warning if you try to submit a root path DU request. The root path DU can only be fetched after an explicit submission.

CDPD-93580: Ozone Manager retains native ACL metadata for incoming keys

Previously, if the Ozone client was not updated to the latest version in the user's environment, it sent unwanted ACL metadata during key or file creation . This issue is now fixed. A new server-side configuration ensures that the Ozone Manager removes all native ACL information from the keys before persisting them.

CDPD-94895/CDPD-93951/CDPD-93469: Do not know how many DataNodes cannot accept new writes in a container

Previously, you did not know how many DataNodes were actually readable from the perspective of container allocation and metadata operations. This issue is now fixed and a new metric named NonWritableNodes is added which shows the number of DataNodes that cannot accept new writes because they are either not in IN_SERVICE and HEALTHY state, cannot allocate new containers or cannot write to existing containers.

CDPD-93860: SpaceUsageSource implementation class Fixed is missing toString()

Previously, the toString() method was missing in SpaceUsageSource.Fixed implementation. The toString() method in SpaceUsageSource.Fixed implementation is used for logging and debugging purposes to display volume space usage information such as capacity, used and available space. This issue is now fixed.

CDPD-93609: Ozone Usage is going beyond 100 percentage

Previously, the Ozone Datanode was using more disk space than allowed for Ozone. This issue is now fixed.

CDPD-88335: DataNode decommissioning fails when other DataNodes are offline due to invalid affinity node in Ratis replication

Previously, DataNode decommissioning failed if one or more DataNodes holding Ratis replicas were already offline. During decommissioning, the Storage Container Manager (SCM) attempts to replicate Ratis container data to maintain redundancy. However, SCM’s placement logic still references an affinity node that was previously part of the network topology but has since been removed due to the DataNode going offline. This resulted in repeated placement errors and prevented the decommissioning process from completing successfully. This issue is now fixed.

Apache Jira: HDDS-13544

CDPD-93901: Not able to track Ozone Storage Container Manager (SCM) safemode rules and exit

In large clusters, the Ozone SCM safemode might take a lot of time to exit. Previously, the you were not able to track the Ozone SCM Manager safemode rules and exit. This issue is not fixed and a new dashboard named "SCM Safemode" is introduced in Grafana which contains a chart for each safemode rule displaying its target and actual value. It also displays if the SCM is in safemode or not by showing "In Safemode" in red and "Exited safemode" in green respectively.

Apache Jira: HDDS-14039

CDPD-93793: Storage Container Manager (SCM) does not log safemode exit rules at regular intervals

Previously, SCM logged rule statuses at arbitrary time intervals instead of logging them at regular intervals. This issue is now fixed and SCM logs safemode exit rule status at regular intervals while in safemode. The logging interval is configurable through hdds.scm.safemode.log.interval. Default value is 1 minute. Each log entry includes overall safemode state, precheck completion status, and current progress of each exit rule.

Apache Jira: HDDS-14012

CDPD-94474: Kudu tablet server crashes on RHEL 9

Previously, Kudu tablet servers crashed on Red Hat Enterprise Linux 9 (RHEL 9) x86_64 platforms when the libgcc package was updated to version 11.5.0-11 or newer. These crashes typically manifested as a Segmentation Fault (SIGSEGV) or an Abort (SIGABRT) error under specific workloads. This issue is now resolved.

Apache Jira: KUDU-3736