Iceberg overview

Apache Iceberg is a table format for huge analytics datasets in the cloud. You can efficiently query large Iceberg tables on object stores. Iceberg supports concurrent reads and writes on all storage media.

You can use Iceberg when a single table contains tens of petabytes of data, and you can read these tables without compromising performance. From Apache Hive (read/write) and Apache Impala (read), you can query Iceberg tables. V1 capabilities have reached general availability (GA) status. The following features are included:

Hive and Impala reads of Iceberg V2 tables
Row-level delete and update Data Modification Language (DML) operations, making Apache Iceberg ACID compliant with serializable isolation and an optimistic concurrency model
Impala reads of tables that have had row-level deletes or updates
Hive writes to Iceberg V2 tables
Materialized views of Iceberg tables
Enhanced maintenance features, such as expiring and removing old snapshots and compaction of small files
Performance and scalability enhancements

In Cloudera Public Cloud Data Hub, you can deploy Iceberg based applications across multiple clouds including AWS, Azure and Google Cloud.

You create Iceberg tables and run queries from Hive or Impala in CDP.. The Hive metastore stores Iceberg metadata, including the location of the table.

Hive metastore plays a lightweight role in the Catalog operations. Iceberg relieves Hive metastore (HMS) pressure by storing partition information in metadata files on the file system/object store instead of within the HMS. This architecture supports rapid scaling without performance hits.

By default, Hive and Impala use the Iceberg HiveCatalog. Cloudera recommends the default HiveCatalog to create an Iceberg table.

Apache Iceberg integrates Apache Ranger for security. You can use Ranger integration with Hive and Impala to apply fine-grained access control to sensitive data in Iceberg tables. Iceberg is also integrated with Data Visualization for creating dashboards and other graphics of your Iceberg data.

Iceberg overview

We want your opinion

How can we improve this page?