Known Issues Iceberg
Learn about the known issues in Iceberg, the impact or changes to the functionality, and the workaround.
- Concurrent compactions and modify statements can corrupt Iceberg tables
- Hive or Impala DELETE/UPDATE/MERGE operations on Iceberg V2
tables can corrupt the tables if there is a concurrent table compaction from Spark. The
issue happens if the compaction and modify statement runs in parallel, and if the
compaction job commits before the modify statement. In that case the modify statement’s
position delete files still point to the old files. The results in the case of DELETE
and in the case of UPDATE / MERGE are as follows:
Delete records pointing to old files have no effect.
- UPDATE / MERGE
Delete records pointing to old files have no effect. The table will also have the newly added data records, which means rewritten records will still be active.
- Use one of the following workarounds:
- Do not run compactions and DELETE/UPDATE/MERGE statements in parallel.
- Do not compact the table via Iceberg’s RewriteFiles operation. For example do not use Spark’s rewriteDataFiles.
- CDPD-57551: Performance issue can occur on reads after writes of Iceberg tables
- Hive might generate too many small files, which causes performance degradation.
- Maintain a relatively small number of data files under the
iceberg table/partition directory to have efficient reads. To alleviate poor performance
caused by too many small files, run the following queries:
TRUNCATE TABLE target; INSERT OVERWRITE TABLE target select * from target FOR SYSTEM_VERSION AS OF <preTruncateSnapshotId>;