Learn about the known issues in Iceberg, the impact or changes to the functionality,
and the workaround.
Known issues identified in Cloudera Runtime 7.3.1.400 SP2
There are no new issues identified in this release.
Known issues identified in Cloudera Runtime 7.3.1.300 SP1
CHF1
There are no new issues identified in this release.
Known issues identified in Cloudera Runtime 7.3.1.200 SP1
There are no new issues identified in this release.
Known issues identified in Cloudera Runtime 7.3.1.100 CHF1
CDPD-78381: Performance degradation noticed in some Hive Iceberg
TPC-DS queries
7.3.1.100, 7.3.1.200, 7.3.1.300
While running Hive TPC-DS (Parquet + Iceberg) performance
benchmarking for Cloudera Runtime 7.3.1.100, the overall performance
of Iceberg tables resulted in a 15.68% increase as compared to Iceberg tables in Cloudera Runtime 7.3.1.0. However, it was noticed that some of the
queries resulted in a decreased performance.
None.
CDPD-78134: CBO fails when a materialized view is dropped but
its pre-compiled plan remains in the registry.
7.3.1.100, 7.3.1.200, 7.3.1.300
Consider a cluster having two HiveServer (HS2) instances. Each
HS2 instance contains its own Materialized View (MV) registry and the registries contain
pre-complied plans of MVs that are enabled for query rewriting. Without the registries,
MVs will have to be loaded and compiled during each query compilation, resulting in slow
query performance.
When MVs are created or dropped, they are added to or removed from
the registry pertaining to the HS2 instance that issues the create or drop statement.
The other HS2 instance is not immediately notified of the change. A background process
is scheduled to refresh the registry, however, this process does not handle the
removal of dropped MVs.
When an MV is dropped by one of the HS2 instances, it
remains in the registry of the other HS2 instance. Now, if a query is processed in the
second HS2 instance, the rewrite algorithm still attempts to use the dropped MV. If
this MV is stored in an Iceberg table, the storage handler tries to refresh the MV
metadata from the metastore but throws an exception because the MV no longer exists,
resulting in a CBO failure.
Perform one of the following workarounds to address the
issue:
Restart all the HS2 instances after dropping the MV.
From Cloudera Manager, go to Clusters > Hive > Configuration and add the
hive.server2.materializedviews.registry.impl=DUMMY property in
the HiveServer2 Advanced Configuration Snippet (Safety Valve) for
hive-site.xml. The DUMMY value indicates that MVs should not be cached
and requests should be forwarded to Hive Metastore.
CDPD-75411: SELECT COUNT query on an Iceberg
table in AWS times out
7.3.1, 7.3.1.100, 7.3.1.200
7.3.1.300
In an AWS environment, a SELECT COUNT query
that is run on an Iceberg table times out because some 4KB ORC file parts cannot be
downloaded. This issue occurs because Iceberg uses the positional delete index only if
the count of positional deletes are less than a threshold value which is by default,
100000.
None.
CDPD-75088: Iceberg tables in azure cannot be partitioned by
strings ending in '.'
7.3.1, 7.3.1.100, 7.3.1.200, 7.3.1.300
In an Azure environment, you cannot create Iceberg tables from
Spark that are partitioned by string columns having a partition value that contains the
period (.) character. The query fails with the following
error:
24/10/08 18:14:12 WARN scheduler.TaskSetManager: [task-result-getter-2]: Lost task 0.0 in stage 2.0 (TID 2) (spark-sfvq0t-compute0.spark-r9.l2ov-m7vs.int.cldr.work executor 1): java.lang.IllegalArgumentException: ABFS does not allow files or directories to end with a dot.
None.
CDPD-72942: Unable to read Iceberg table from Hive after writing
data through Apache Flink
7.3.1, 7.3.1.100, 7.3.1.200, 7.3.1.300
If you create an Iceberg table with default values using Hive
and insert data into the table through Apache Flink, you cannot then read the Iceberg
table from Hive using the Beeline client, and the query fails with the following
error:
Error while compiling statement: java.io.IOException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in mapredWork!
The
issue persists even after you use the ALTER TABLE statement to set the
engine.hive.enabled table property to "true".
CDPD-71962: Hive cannot write to a Spark Iceberg table bucketed
by date column
7.3.1, 7.3.1.100, 7.3.1.200, 7.3.1.300
If you have used Spark to create an Iceberg table that is
bucketed by the "date" column and then try inserting or updating this Iceberg table
using Hive, the query fails with the following
error:
Error: Error while compiling statement: FAILED: RuntimeException org.apache.hadoop.hive.ql.exec.UDFArgumentException: ICEBERG_BUCKET() only takes STRING/CHAR/VARCHAR/BINARY/INT/LONG/DECIMAL/FLOAT/DOUBLE types as first argument, got DATE (state=42000,code=40000)
This
issue does not occur if the Iceberg table is created through Hive.
None.
CDPD-84220: Cannot query Iceberg tables
7.3.1, 7.3.1.100, 7.3.1.200, 7.3.1.300,
7.3.1.400
You cannot query existing Iceberg tables after you enable HDFS
HA. This is because Iceberg stores the table path in the manifest files differently
depending on whether the HDFS HA is enabled or not. After you enable HDFS HA, you might
not be able to query the tables created prior to you enabling HDFS HA.