Known Issues Iceberg

Learn about the known issues in Iceberg, the impact or changes to the functionality, and the workaround.

CDPD-57551: Performance issue can occur on reads after writes of Iceberg tables
Hive might generate too many small files, which causes performance degradation.
Maintain a relatively small number of data files under the iceberg table/partition directory to have efficient reads. To alleviate poor performance caused by too many small files, run the following queries:
TRUNCATE TABLE target;
INSERT OVERWRITE TABLE target select * from target FOR SYSTEM_VERSION AS OF <preTruncateSnapshotId>;
CDPD-75422: Impala schema case sensitivity issue with workaround
Impala's schema is case insensitive, causing errors with mixed case schema elements created through Spark during predicate pushdown.
  • Create tables through Impala to ensure lower case schema.
  • Avoid upper case in Spark: Do not use upper case when creating tables through Spark.
  • Fix existing tables: Use ALTER TABLE to rename upper case columns:
    ALTER TABLE `database`.`iceberg_table` CHANGE COLUMN ID id string;
CDPD-66305: Do not turn on the optimized Iceberg V2 operator in 7.2.18.0
In this release, the optimized Iceberg V2 operator is disabled by default due to a correctness issue. The correct setting for the property that turns off the operator is DISABLE_OPTIMIZED_ICEBERG_V2_READ=true.
Accept the default setting of the V2 operator. Do not change the setting from true to false.
CDPD-64629: Performance degradation of Iceberg tables compared to Hive tables
Cloudera testing of Iceberg and Hive tables using the Hive TPC-DS 1 Tb dataset (Parquet) revealed a slower performance executing a few of the queries in TPCDS. Overall performance of Iceberg executing queries on Hive external tables of Iceberg is faster than Hive.
CDPD-84220: Cannot query Iceberg tables
You cannot query existing Iceberg tables after you enable HDFS HA. This is because Iceberg stores the table path in the manifest files differently depending on whether the HDFS HA is enabled or not. After you enable HDFS HA, you might not be able to query the tables created prior to you enabling HDFS HA.
None.

Technical Service Bulletins

TSB 2024-758: Truncate command on Iceberg V2 branches cause unintentional data deletion
When working with Apache Hive (Hive) and Apache Iceberg (Iceberg) V2 tables, using the TRUNCATE statement may lead to unintended data deletion. This issue arises when the truncate command is applied to a branch of an Iceberg table. Instead of truncating the branch itself, the command affects the original (main) table, which results in unintended loss of data.
Knowledge article
For the latest update on this issue see the corresponding Knowledge article: TSB 2024-758: Truncate command on Iceberg V2 branches cause unintentional data deletion