Known Issues in Apache Iceberg
Learn about the known issues in Iceberg, the impact or changes to the functionality, and the workaround.
Technical Service Bulletins
- TSB 2024-752: Dangling delete issue in Spark rewrite_data_files procedure causes incorrect results for Iceberg V2 tables
- The Spark Iceberg library includes two procedures -
rewrite_data_files and rewrite_position_delete_files. The current implementation of
rewrite_data_files has a limitation that the position delete files are not deleted and
still tracked by the table metadata, even if they no longer refer to an active data
file. This is called the dangling delete problem. To solve this, the
rewrite_position_delete_files procedure is implemented in the Spark Iceberg library to
remove these old “dangling” position delete files.
Due to the dangling delete limitation, when an Iceberg table with dangling deletes is queried in Impala, Impala tries to optimize select count(*) from iceberg_table query to return the results using stats. This optimization returns incorrect results.
The following conditions must be met for this issue to occur:- All delete files in the Iceberg table are “dangling”
- This would occur immediately after running Spark rewrite_data_files AND
- Before any further delete operations are performed on the table OR
- Before Spark rewrite_position_delete_files is run on the table
- This would occur immediately after running Spark rewrite_data_files AND
- Only stats optimized plain select count(*) from iceberg_table queries are
affected. For example, the query should not have:
- Any WHERE clause
- Any GROUP BY clause
- Any HAVING clause
Remove dangling deletes: After rewrite_data_files, position delete records pointing to the rewritten data files are not always marked for removal, and can remain tracked by the live snapshot metadata of the table. This is known as the dangling delete problem.
- All delete files in the Iceberg table are “dangling”
- Knowledge article
- For the latest update on this issue see the corresponding Knowledge article: TSB 2024-752: Dangling delete issue in Spark rewrite_data_files procedure causes incorrect results for Iceberg V2 tables.