Fixed Issues in Apache Iceberg

Review the list of Iceberg issues that are resolved in Cloudera Runtime 7.3.2, its service packs and cumulative hotfixes.

Cloudera Runtime 7.3.2

Cloudera Runtime 7.3.2 resolves Iceberg issues and incorporates fixes from the service packs and cumulative hotfixes from 7.3.1.100 through 7.3.1.706. For a comprehensive record of all fixes in Cloudera Runtime 7.3.1.x, see Fixed Issues.

CDPD-78686: Iceberg tables created in 7.2.17 are not captured in Atlas when using 7.2.18 or 7.3.1 Atlas server
7.3.2
This issue occurred due to incompatibility between Data Services, Hive, and Impala hooks in 7.2.17 and the Atlas server in 7.2.18 and 7.3.1.

This fix resolves the compatibility issue and Iceberg tables created in 7.2.17 are now correctly captured and displayed in the Atlas UI in later versions.

CDPD-97171: Concurrency issues between compaction and concurrent write operations
7.3.2
This fix resolves an issue where compaction conflicted with concurrent write operations causing data corruption, by improving concurrency handling to ensure stable operations and data consistency.
Apache Jira: HIVE-29437
CDPD-74040: Dropping Iceberg table with complex type containing timestamp fails
7.3.2
This fix resolves an issue where dropping an Iceberg table with complex types such as array, map, or struct containing timestamp fields failed due to unsupported Hive timestamp type handling, by ensuring proper type conversion and successful table deletion.
CDPD-89402: Event processing invalidates Iceberg tables due to reload failures
7.3.2
This fix resolves an issue where CatalogServiceCatalog.reloadTableIfExists() resulted in a ClassCastException during event processing, invalidating Iceberg tables and triggering full table reloads instead of incremental loading.
Apache Jira: IMPALA-14358
CDPD-72383: COUNT(*) optimization returns incorrect results in UNION queries on Iceberg V2 tables
7.3.2
A faulty COUNT(*) query optimization caused incorrect results in UNION queries on Iceberg V2 tables that have delete files, leading to inconsistent query outputs.

The fix corrects the optimization logic to ensure accurate results across UNION queries, including scenarios involving post data changes.

Apache Jira: IMPALA-13756 IMPALA-13249
CDPD-81076: LEFT ANTI JOIN fails on Iceberg V2 tables with Delete files
7.3.2
Queries using a LEFT ANTI JOIN fail with an AnalysisException if the right-side table is an Iceberg V2 table containing delete files. For example, consider the following query:
SELECT * FROM table_a a
LEFT ANTI JOIN iceberg_v2_table b
ON a.id = b.id;

The error Illegal column/field reference'b.input_file_name' of semi-/anti-joined table 'b' is displayed because semi-joined tuples need to be explicitly made visible for paths pointing inside them to be resolvable.

The fix updates the IcebergScanPlanner to ensure that the tuple containing the virtual fields is made visible when it is semi-joined.

Apache Jira: IMPALA-13888

CDPD-78427: Enable MERGE statement for Iceberg tables with equality deletes
7.3.2
This patch fixes an issue that caused MERGE statements to fail on Iceberg tables that use equality deletes.

The failure occurred because the delete expression calculation was missing the data sequence number, even though the underlying data description included it. This mismatch caused row evaluation to fail.

The fix ensures the data sequence number is correctly included in the result expressions, allowing MERGE operations to complete successfully on these tables.

Apache Jira: IMPALA-13674

CDPD-77773: Tolerate missing data files during Iceberg table loading
7.3.2
This fix addresses an issue where an Iceberg table would fail to load completely if any of its data files were missing from the file system. This TableLoadingException left the table in an incomplete state, blocking all operations on it.

Impala now tolerates missing data files during the table loading process. An exception will only be thrown if a query subsequently attempts to read one of the specific files that is missing.

This change allows other operations that do not depend on the missing data—such as ROLLBACK, DROP PARTITION, or SELECT statements on valid partitions—to execute successfully.

Apache Jira: IMPALA-13654

CDPD-78508: Skip reloading Iceberg tables when metadata JSON file is the same
7.3.2
This patch optimizes metadata handling for Iceberg tables, particularly those that are updated frequently.

Previously, if an event processor was lagging, Impala might receive numerous update events for the same table (for example, 100 events). Impala would attempt to reload the table 100 times, even if the table's state was already up-to-date after processing the first event.

With this fix, Impala now compares the path of the incoming metadata JSON file with the one that is currently loaded. If the metadata file location is the same, Impala skips the reload, correctly assuming the table is already unchanged. This significantly reduces unnecessary metadata processing.

Apache Jira: IMPALA-13718

CDPD-82415: TABLESAMPLE clause of the COMPUTE STATS statement has no effect on Iceberg tables
7.3.2
This fix resolves a regression introduced by IMPALA-13737. For example, the following query scans the entire Iceberg table to calculate statistics, whereas it should ideally use only about 10% of the data.
COMPUTE STATS t TABLESAMPLE SYSTEM system(10);

This fix introduces proper table sampling logic for Iceberg tables, which can be utilized for COMPUTE STATS. The sampling algorithm previously located in IcebergScanNode.getFilesSample() is now relocated to FeIcebergTable.Utils.getFilesSample().

Apache Jira: IMPALA-14014

CDPD-85228: IllegalStateException with Iceberg table with DELETE
7.3.2
Running a query on an Iceberg table fails with an IllegalStateException error in the following scenario:
  • The Iceberg table has delete files for every data file (no data files without delete files) AND
  • An anti-join operation is performed on the result of the Iceberg delete operation (IcebergDeleteNode or HashJoinNode)

This fix resolves the issue by setting the TableRefIds of the node corresponding to the Iceberg delete operation (IcebergDeleteNode or HashJoinNode) to only the table reference associated with the data files, excluding the delete files.

Apache Jira: IMPALA-14154

CDPD-87405: Error unnesting arrays in Iceberg tables with DELETE files
7.3.2
The following error occurred when unnesting a nested array (a 2D array) from an Iceberg table. This issue was triggered specifically when the table contained delete files for some, but not all, of its data files.
Filtering an unnested collection that comes from a UNION [ALL] is not supported yet.

Reading an Iceberg table with this mixed data and delete file configuration creates a UNION ALL node in the query execution plan. The system had a check that explicitly blocked any filtering on an unnested array.

This fix relaxes the validation check, allowing the operation to proceed if all UNION operands share the same tuple IDs. This ensures the query can successfully unnest the array.

Apache Jira: IMPALA-14185