Known Issues Iceberg

Learn about the known issues in Iceberg, the impact or changes to the functionality, and the workaround.

Concurrent compactions and modify statements can corrupt Iceberg tables
Hive or Impala DELETE/UPDATE/MERGE operations on Iceberg V2 tables can corrupt the tables if there is a concurrent table compaction from Spark. The issue happens if the compaction and modify statement runs in parallel, and if the compaction job commits before the modify statement. In that case the modify statement’s position delete files still point to the old files. The results in the case of DELETE and in the case of UPDATE / MERGE are as follows:
  • DELETE

    Delete records pointing to old files have no effect.

  • UPDATE / MERGE

    Delete records pointing to old files have no effect. The table will also have the newly added data records, which means rewritten records will still be active.

Use one of the following workarounds:
  • Do not run compactions and DELETE/UPDATE/MERGE statements in parallel.
  • Do not compact the table via Iceberg’s RewriteFiles operation. For example do not use Spark’s rewriteDataFiles.
CDPD-57551: Performance issue can occur on reads after writes of Iceberg tables
Hive might generate too many small files, which causes performance degradation.
Maintain a relatively small number of data files under the iceberg table/partition directory to have efficient reads. To alleviate poor performance caused by too many small files, run the following queries:
TRUNCATE TABLE target;
INSERT OVERWRITE TABLE target select * from target FOR SYSTEM_VERSION AS OF <preTruncateSnapshotId>;
CDPD-66305: Do not turn on the optimized Iceberg V2 operator in 7.2.18.0
In this release, the optimized Iceberg V2 operator is disabled by default due to a correctness issue. The correct setting for the property that turns off the operator is DISABLE_OPTIMIZED_ICEBERG_V2_READ=true.
Accept the default setting of the V2 operator. Do not change the setting from true to false.
CDPD-64629: Performance degradation of Iceberg tables compared to Hive tables
Cloudera testing of Iceberg and Hive tables using the Hive TPC-DS 1 Tb dataset (Parquet) revealed a slower performance executing a few of the queries in TPCDS. Overall performance of Iceberg executing queries on Hive external tables of Iceberg is faster than Hive.