Iceberg
You can review the list of reported issues and their fixes for Iceberg in 7.3.1.100.
- CDPD-75667: Querying an Iceberg table with a
TIMESTAMP_LTZ
column can result in data loss - When you query an Iceberg table that has a
TIMESTAMP_LTZ
column, the query could result in data loss.When Impala changes the
TIMESTAMP_LTZ
column toTIMESTAMP
, it does it by callingalter_table()
on Hive Metastore (HMS) directly. It provides a Metastore Table object to HMS as the desired state of the table. HMS then persists this table object.This issue is fixed by avoiding the
alter_table()
call to HMS towards the end of loading the Iceberg table. This avoids the necessity of persisiting the schema adjustments that Impala had to do while loading the table. - CDPD-78355: Impala should ignore character case of Iceberg schema elements
- Impala cannot read Iceberg tables written by Apache Spark that contain
schema elements in uppercase or lowercase letters.
Schema is case insensitive in Impala, however, Spark allows creation of schema elements with uppercase or lowercase letters and stores them in the metadata JSON files of Iceberg.
With this fix, Impala invokes
Scan.caseSensitive(boolean caseSensitive)
on the TableScan object to set case insensivity. - CDPD-78362: Schema resolution does not work for migrated partitioned Iceberg tables that have complex types
- Schema resolution does not work correctly for migrated partitioned
Iceberg tables that have complex data types. This fix addresses the field ID generation by
taking the number of partitions into account. If none of the partition columns are included in
the data file (common scenario), file-level field IDs are adjusted accordingly. You could also
come across a scenario where all the partition columns are included in the data
files.
However, if some partition columns are included in the data file while other partition columns are not, an error is generated.
- CDPD-78540: DELETE statement throws DateTimeParseException when deleting from DAY-partitioned Iceberg tables
- Due to an issue in
IcebergDeleteSink
, Impala cannot successfully run a DELETE operation on Iceberg tables that are partitioned by time-based transforms (YEAR, MONTH, DAY, HOUR).This fix addresses the error by adding functions that transforms the partition values to their human-readable representations. This is done in the
IcebergDeleteSink
so that the Catalog-side logic is not affected. - CDPD-78562: Iceberg tables have a large memory footprint in catalog cache
- This fix clears the GroupContentFiles after they are used.
GroupContentFiles stores the file descriptors in Iceberg's format and is used for creating file descriptors in Impala's format. Once the creation is complete, we do not have to retain the Iceberg ContentFiles. Dropping this can significantly reduce the memory footprint of an Iceberg table.
For example, the memory size of a test Iceberg table containing 110k files was reduced from 140MB to 80MB after cleaning the GroupContentFiles.