Learn about the known issues in Iceberg, the impact or changes to the functionality,
and the workaround.
Known Issues identified in Cloudera Runtime 7.3.2
- DWX-18843: Unable to read Iceberg table from Hive Virtual
Warehouse
- 7.3.2
- If you have used Apache Flink to insert data into an Iceberg
table that is created from Hive, you cannot read the Iceberg table from the Hive Virtual
Warehouse.
- Add the
engine.hive.enabled table
property through the Hive beeline and set the value to "true". You can add this table
property either while creating the Iceberg table or use the ALTER TABLE
statement to add the table property.
- DWX-18489: Hive compaction of Iceberg tables results in a
failure
- 7.3.2
- When Cloudera Data Warehouse and Cloudera Data Hub are deployed in the same environment and use the same
Hive Metastore (HMS) instance, the Cloudera Data Hub compaction workers
can inadvertently pick up Iceberg compaction tasks. Since Iceberg compaction is not yet
supported in the latest Cloudera Data Hub version, the compaction tasks
will fail when they are processed by the Cloudera Data Hub compaction
workers.
In such a scenario where both Cloudera Data Warehouse and Cloudera Data Hub share the same HMS instance and there is a
requirement to run both Hive ACID and Iceberg compaction jobs, it is recommended that
you use the Cloudera Data Warehouse environment for these jobs. If you
want to run only Hive ACID compaction tasks, you can choose to use either the Cloudera Data Warehouse or Cloudera Data Hub
environments.
- If you want to run the compaction jobs without changing
the environment, it is recommended that you use Cloudera Data Warehouse. To
avoid interference from Cloudera Data Hub, change the value of the
hive.compactor.worker.threads Hive Server (HS2) property to '0'. This
ensures that the compaction jobs are not processed by Cloudera Data Hub.
- In Cloudera Manager, click to navigate to the configuration page for HMS.
- Search for
hive.compactor.worker.threads and modify the value to
'0'.
- Save the changes and restart the Hive service.
- DWX-17254: Merging Iceberg branches requires a target table
alias
- 7.3.2
- Hive supports only one level of qualifier when referencing
columns. In other words only one dot is accepted. For example,
select table.col
from ...; is allowed. select db.table.col is not allowed.
Using the merge statement to merge Iceberg branches without a target or source table
alias causes an exception:
org.apache.hadoop.hive.ql.parse.SemanticException: ... Invalid table alias or column reference ...
- Use an alias, for example t, for the target table.
merge into mydb.target.branch_branch1 t using mydb.source.branch_branch1 s on t.id = s.id when matched then update set value = 'matched';
Apache Jira: HIVE-28055
- DWX-17210, DWX-13733: Timeout issue querying Iceberg tables from
Hive
- 7.3.2
- When querying Iceberg tables from Hive, the queries can faile
due to a timeout issue.
-
- Add the following configurations to
hadoop-core-site for the
Database Catalog and the Virtual Warehouse.
- fs.s3.maxConnections=1000
- fs.s3a.connection.maximum=1000
- Restart the Database Catalog and Virtual Warehouse.
- DWX-14163: Limitations reading Iceberg tables in Avro file
format from Impala
- 7.3.2
- The Avro, Impala, and Iceberg specifications describe some
limitations related to Avro, and those limitations exist in Cloudera. In addition to these, the DECIMAL type is not
supported in this release.
- None.
- DEX-7946: Data loss during migration of a Hive table to
Iceberg
- 7.3.2
- In this release, by default the table property
'external.table.purge' is set to true, which deletes the table data and metadata if you
drop the table during migration from Hive to Iceberg.
- Either one of the following workarounds prevents data loss
during table migration:
- Set the table property 'external.table.purge'='FALSE'.
- Do not drop a table during migration from Hive to Iceberg.
- DWX-13062: Converting a Hive table having CHAR or VARCHAR
columns to Iceberg causes an exception
- 7.3.2
- CHAR and VARCHAR data can be shorter than the length specified
by the data type. Remaining characters are padded with spaces. Data is converted to a
string in Iceberg. This process can yield incorrect results when you query the converted
Iceberg table.
- Change columns from CHAR or VARCHAR to string types before
converting the Hive table to Iceberg.
Apache Jira:
HIVE-26507
Known issues identified before Cloudera Runtime 7.3.2
Known issues identified before Cloudera Runtime 7.3.2 include only unresolved issues from
previous releases that continue to affect the Cloudera Runtime 7.3.2 base release.
- CDPD-92182: Inserting into Hive Iceberg tables on S3 fails with
RazS3ClientCredentialsException
- 7.3.2, 7.3.1.706, 7.3.1.600, 7.3.1.500
- In RAZ-enabled clusters where HDFS is the default file system,
attempts to insert data into Hive Iceberg tables pointing explicitly to S3 locations
fail with a
RazS3ClientCredentialsException
- None.
- CDPD-89390/CDPD-83022: Incorrect row count displayed in table metadata after
compaction
- 7.3.2, 7.3.1.600, 7.3.1.500, 7.3.1.400
- After running data compaction operations on large tables, the
row count displayed by the
DESCRIBE FORMATTED command may be
inaccurate. Initially, the count may appear higher than the actual number of rows.
Subsequently, after running the ANALYZE command to update table
statistics, the count might then appear lower than the actual number of rows.This
issue has been observed in large tables containing a significant number of historical
snapshots (exceeding 10000). All these snapshots are primarily generated through
UPDATE operations.
It is important to note that this is just
a metadata display issue, and there is no loss of data. The underlying table data
remains complete and correct.
- To obtain an accurate row count, use the
SELECT
COUNT(*) query.