Known Issues in Apache Iceberg
Learn about the known issues in Iceberg, the impact or changes to the functionality, and the workaround.
- CDPD-75667: Querying an Iceberg table with a
TIMESTAMP_LTZ
column can result in data loss - When you query an Iceberg table that has a
TIMESTAMP_LTZ
column, the query could result in data loss. - CDPD-75088: Iceberg tables in azure cannot be partitioned by strings ending in '.'
- In an Azure environment, you cannot create Iceberg tables from
Spark that are partitioned by string columns having a partition value that contains the
period (.) character. The query fails with the following
error:
24/10/08 18:14:12 WARN scheduler.TaskSetManager: [task-result-getter-2]: Lost task 0.0 in stage 2.0 (TID 2) (spark-sfvq0t-compute0.spark-r9.l2ov-m7vs.int.cldr.work executor 1): java.lang.IllegalArgumentException: ABFS does not allow files or directories to end with a dot.
- CDPD-72942: Unable to read Iceberg table from Hive after writing data through Apache Flink
- If you create an Iceberg table with default values using Hive and
insert data into the table through Apache Flink, you cannot then read the Iceberg table
from Hive using the Beeline client, and the query fails with the following
error:
Error while compiling statement: java.io.IOException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in mapredWork!
The issue persists even after you use the ALTER TABLE statement to set the
engine.hive.enabled
table property to "true". - CDPD-71962: Hive cannot write to a Spark Iceberg table bucketed by date column
- If you have used Spark to create an Iceberg table that is bucketed
by the "date" column and then try inserting or updating this Iceberg table using Hive, the
query fails with the following
error:
Error: Error while compiling statement: FAILED: RuntimeException org.apache.hadoop.hive.ql.exec.UDFArgumentException: ICEBERG_BUCKET() only takes STRING/CHAR/VARCHAR/BINARY/INT/LONG/DECIMAL/FLOAT/DOUBLE types as first argument, got DATE (state=42000,code=40000)
This issue does not occur if the Iceberg table is created through Hive.
- CDPD-66305: Do not turn on the optimized Iceberg V2 operator
- The optimized Iceberg V2 operator is disabled by default due to a
correctness issue. The correct setting for the property that turns off the operator is
DISABLE_OPTIMIZED_ICEBERG_V2_READ=true
. - CDPD-64629: Performance degradation of Iceberg tables compared to Hive tables
- Cloudera testing of Iceberg and Hive tables using the Hive TPC-DS 1 Tb dataset (Parquet) revealed a slower performance executing a few of the queries in TPCDS. Overall performance of Iceberg executing queries on Hive external tables of Iceberg is faster than Hive.
- CDPD-57551: Performance issue can occur on reads after writes of Iceberg tables
- Hive might generate too many small files, which causes performance degradation.