Unsupported features and limitations

Cloudera does not support all features in Apache Iceberg. The list of unsupported features for Cloudera Data Platform (CDP) differs from release to release. Also, Apache Iceberg in CDP has some limitations you need to understand.

Unsupported features

The following features are not supported in this release of CDP:

Tagging and branching
A technical preview is supported from Hive (not Impala or Spark) in Cloudera Data Warehouse Public Cloud.
Equality deletes
Reading files outside the table directory
An unauthorized party who knows the underlying schema and file location outside the table location can rewrite the manifest files within one table location to point to the data files in another table location to read your data.
Buckets defined from Hive do not create like buckets in Iceberg.
For more information, see "Bucketing workaround" below.
Using Iceberg tables as Spark Structured Streaming sources or sinks
PyIceberg
Migration of Delta Lake tables to Iceberg

Limitations

The following features have limitations or are not supported for Apache Iceberg:

Impala only supports reading data files in the AVRO format but does not yet support reading delete files in AVRO format.
This means Impala can always read Iceberg V1 tables containing AVRO files because the V1 version does not have delete files. However, Iceberg V2 tables configured with the merge-on-read delete strategy might contain AVRO delete files.
If Impala encounters AVRO delete files while reading an Iceberg V2 table, it fails with an error.
Workaround: Use one of the following methods to access or modify the table:
- Use Hive or Spark to read Iceberg V2 tables that contain AVRO delete files.
- Change the delete strategy to copy-on-write or compact the table to eliminate the delete files.
- Rewrite the Iceberg table files to the Parquet file format so that Impala can read the delete files as well.
Multiple insert overwrite queries that read data from a source table.
When the underlying table is changed, you need to rebuild the materialized view manually, or use the Hive query scheduling to rebuild the materialized view.
From Impala, you can read, but not write, position updates and deletes.
Equality updates and deletes are not supported as previously mentioned.
An equality delete file in the table is the likely cause of a problem with updates or deletes in the following situations:
- In Change Data Capture (CDC) applications
- In upserts from Apache Flink
- From a third-party engine
An Iceberg table that points to another Iceberg table in the HiveCatalog is not supported.
For example:
```
CREATE EXTERNAL TABLE ice_t
STORED BY ICEBERG
TBLPROPERTIES ('iceberg.table_identifier'='db.tb');
```
See also Iceberg data types limitations and unsupported data types.

Bucketing workaround

A query from Hive to define buckets/folders in Iceberg do not create the same number of buckets/folders as the same query creates in Hive. In Hive bucketing by multiple columns using the following clause creates 64 buckets maximum inside each partition.

| CLUSTERED BY (                                     |
|   id,                                              |
|   partition_id)                                    |
| INTO 64 BUCKETS

Defining bucketing from Hive on multiple columns of an Iceberg table using this query creates 64*64 buckets/folders; consequently, bucketing by group does not occur as expected. The operation will create many small files at scale, a drag on performance.

Add multiple bucket transforms (partitions) to more than one column in the current version of Iceberg as follows:

bucket(p, col1, col2) =[ bucket(m, col1) , bucket(n, col2) ] where p = m * n