Fixed Issues in Iceberg
Review the list of Iceberg issues that are resolved in Cloudera Runtime 7.1.9 SP1.
- CDPD-45139: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
- This fix helps by improving the Impala query performance when reading Iceberg tables with delete files.
- CDPD-47349: Use
hive.metastore.table.owner
during table creation - Creation of an Iceberg table used to happen in two steps. The
first step creates a table, however, the table is created with the wrong owner. The second
step runs the
ALTER TABLE
statement to set the correct table owner.This fix resolves the issue by creating an Iceberg table with the correct owner in a single step. When creating an Iceberg table, Impala specifies the owner in the
hive.metastore.table.owner
table property. - CDPD-55029: Include snapshot ID of Iceberg tables in query plan or profile
- This fix includes the snapshot ID of Iceberg tables for the Iceberg SCAN operators, which is useful to enable queries to be re-executed. Re-executing queries are useful because they help you to better investigate performance problems and bugs.
- CDPD-59657: Iceberg V2 operator provides incorrect results in PARTITIONED mode
- When PARTITIONED mode is used, the fix introduced through IMPALA-12327 performs a binary search when the position-based difference between the current row and previous row is not one.
- CDPD-60282: Need better cardinality estimation for Iceberg V2 tables with deletes
- Currently, the cardinality of the IcebergDeleteNode is the same as
the cardinality of the left-hand side (LHS) and does not take into account the cardinality
of the right-hand side (RHS). The RHS contains position delete records, therefore, all the
records in RHS remove a record from the LHS.
If there are joins on the Iceberg table, they have the same selectivity on the data records and on the delete records.
This fix updates the cardinality of the IcebergDeleteNode to use the following formula:Cardinality of DELETE operator = Cardinality(LHS) - (Cardinality(RHS) * selectivity of LHS)
- CDPD-60946/CDPD-60717: Iceberg tables created through Trino are incompatible with Impala
- The Trino SQL engine creates Iceberg tables without setting
hive.engine.enabled=true
and does not provide users with an option to manually set this property. Therefore, Trino always creates Iceberg tables with non-HiveIceberg storage descriptors.Impala uses the Input/Output/SerDe properties to determine the table type, however, a table is also considered to be an Iceberg table if the table property,
table_type=ICEBERG
is set.The fix introduced through IMPALA-12413 ensures that modifications to the table from Impala goes through its Iceberg library (with
engine.hive.enabled=true
). This results in setting the HiveIceberg storage descriptors and allows Trino to be compatible with Iceberg tables. - CDPD-66786: Impala returns incorrect results when the optimized Iceberg V2 operator is used
- If you are using Impala to read Iceberg V2 tables, then you might have noticed Impala returning incorrect results when the optimized V2 operator is used. This issue has been resolved by resetting the delete state when it detects records from files that do not have delete records.
- CDPD-67632: Optimized count(*) for Iceberg table gives wrong results after a Spark rewrite_data_files
- Spark's
rewrite_data_files
action can leave dangling Delete files in the Iceberg table. Delete files are not applicable to any data files. This can cause incorrect results in Impala when it runs simplecount(*)
queries with the help of table statistics stored in the Iceberg metadata layer.
Apache Patch Information
- IMPALA-11619
- IMPALA-11776
- IMPALA-12072
- IMPALA-12327
- IMPALA-12371
- IMPALA-12413
- IMPALA-12894
Technical Service Bulletins
- TSB 2024-752: Dangling delete issue in Spark rewrite_data_files procedure causes incorrect results for Iceberg V2 tables
- For the latest update on this issue see the corresponding Knowledge article: TSB 2024-752: Dangling delete issue in Spark rewrite_data_files procedure causes incorrect results for Iceberg V2 tables.