Known Issues Iceberg
Learn about the known issues in Iceberg, the impact or changes to the functionality, and the workaround.
- Concurrent compactions and modify statements can corrupt Iceberg tables
- Hive or Impala DELETE/UPDATE/MERGE operations on Iceberg V2
tables can corrupt the tables if there is a concurrent table compaction from Spark. The
issue happens if the compaction and modify statement runs in parallel, and if the
compaction job commits before the modify statement. In that case the modify statement’s
position delete files still point to the old files. The results in the case of DELETE
and in the case of UPDATE / MERGE are as follows:
- DELETE
Delete records pointing to old files have no effect.
- UPDATE / MERGE
Delete records pointing to old files have no effect. The table will also have the newly added data records, which means rewritten records will still be active.
- DELETE
- CDPD-57551: Performance issue can occur on reads after writes of Iceberg tables
- Hive might generate too many small files, which causes performance degradation.
- CDPD-66305: Do not turn on the optimized Iceberg V2 operator in 7.2.18.0
- In this release, the optimized Iceberg V2 operator is disabled by default due to a correctness issue. The correct setting for the property that turns off the operator is
DISABLE_OPTIMIZED_ICEBERG_V2_READ=true
.
- CDPD-64629: Performance degradation of Iceberg tables compared to Hive tables
- Cloudera testing of Iceberg and Hive tables using the Hive TPC-DS 1 Tb dataset (Parquet) revealed a slower performance executing a few of the queries in TPCDS. Overall performance of Iceberg executing queries on Hive external tables of Iceberg is faster than Hive.