Unsupported features and limitations

Cloudera does not support all features in Apache Iceberg. The list of unsupported features for Cloudera differs from release to release. Also, Apache Iceberg in Cloudera has some limitations you need to understand.

Unsupported features🔗

The following table presents feature limitations or unsupported features:

✖ means not yet tested

N/A means will never be tested, not a GA candidate


Iceberg Feature	Hive	Impala	Spark
Branching/Tagging	✔	✖	✔
Read equality deletes for Flink upserts	✖	✔	✖
Read equality deletes for NiFi	✔	✔	✖
Write equality deletes	N/A	N/A	N/A
Read outside files	N/A	N/A	N/A
Bucketing	✖	✖	✖

The table above shows that the following features are not supported in this release of Cloudera:

Tagging and branching
A technical preview is supported from Hive (not Impala) in Cloudera Data Warehouse on cloud.
Writing equality deletes
Hive and Impala only read equality deletes.
Reading files outside the table directory
Reading outside files is not supported due to the security risk. An unauthorized party who knows the underlying schema and file location outside the table location can rewrite the manifest files within one table location to point to the data files in another table location to read your data.
Buckets defined from Hive do not create like buckets in Iceberg.
For more information, see "Bucketing workaround" below.
Using Iceberg tables as Spark Structured Streaming sources or sinks
PyIceberg
Migration of Delta Lake tables to Iceberg

Limitations🔗

The following features have limitations or are not supported in this release:

Multiple insert overwrite queries that read data from a source table.
When the underlying table is changed, you need to rebuild the materialized view manually, or use the Hive query scheduling to rebuild the materialized view.
You must be aware of the following considerations when using equality deletes:
- Equality updates and deletes are not supported.
- If you are using Apache Flink or Apache NiFi to write equality deletes, then ensure that you provide a PRIMARY KEY for the table. This is required for engines to know which columns to write into the equality delete files.
- If the table is partitioned then the partition columns have to be part of the PRIMARY KEY
- For Apache Flink, the table should be in 'upsert-mode' to write equality deletes
- Partition evolution is not allowed for Iceberg tables that have PRIMARY KEYs
- An equality delete file in the table is the likely cause of a problem with updates or deletes in the following situations:
  - In Change Data Capture (CDC) applications
  - In upserts from Apache Flink
  - From a third-party engine
You must be aware of the following:
- An Iceberg table that points to another Iceberg table in the HiveCatalog is not supported.
  For example:
```
CREATE EXTERNAL TABLE ice_t
STORED BY ICEBERG
TBLPROPERTIES ('iceberg.table_identifier'='db.tb');
```
- See also Iceberg data types.

Bucketing workaround🔗

A query from Hive to define buckets/folders in Iceberg do not create the same number of buckets/folders as the same query creates in Hive. In Hive bucketing by multiple columns using the following clause creates 64 buckets maximum inside each partition.

| CLUSTERED BY (                                     |
|   id,                                              |
|   partition_id)                                    |
| INTO 64 BUCKETS

Defining bucketing from Hive on multiple columns of an Iceberg table using this query creates 64*64 buckets/folders; consequently, bucketing by group does not occur as expected. The operation will create many small files at scale, a drag on performance.

Add multiple bucket transforms (partitions) to more than one column in the current version of Iceberg as follows:

bucket(p, col1, col2) =[ bucket(m, col1) , bucket(n, col2) ] where p = m * n

Unsupported features and limitations

Unsupported features🔗

Limitations🔗

Bucketing workaround🔗

We want your opinion

How can we improve this page?