Apache Iceberg in Cloudera Data Platform

Apache Iceberg is a cloud-native, open table format for organizing petabyte-scale analytic datasets on a file system or object store. Combined with Cloudera Data Platform (CDP) architecture for multi-function analytics users can deploy large scale end-to-end pipelines.

Cloudera Data Engineering (CDE) and Cloudera Data Warehouse (CDW) support Apache Iceberg:
Accessing Iceberg from within CDW and CDE, you can perform the following tasks:
  • Get high throughput reads of large tables at petabyte scale.
  • Run time travel queries.
  • Query tables with high concurrency on Amazon S3.
  • Query Iceberg tables in ORC or Parquet format from Hive or Impala.
  • Query Iceberg tables in Parquet format from Spark.
  • Evolve partitions and schemas quickly and easily.
  • Make schema changes quickly and easily.
  • Secure data using SDX integration (table authorization and policies).
  • Migrate Hive tables to Iceberg.

Limitations

The following general limitations apply to using Iceberg in CDE or CDW. Additional limitations specific to CDE or CDW services are covered in the respective services documentation linked above.

  • Storing Iceberg tables in AVRO is not supported.
  • If partition columns are not present in the data files, tables cannot be read.
  • Spark can read Iceberg tables created by Hive and Impala if you select a Data Lake 7.2.12.1 or later when you register the CDP environment.
  • AWS storage is the only storage supported.

Integration of Iceberg in CDP

CDP integrates the following components:
  • Apache Hive, Apache Impala, and Apache Spark to query Iceberg tables
  • Cloudera Data Warehouse service providing Apache Hive and Apache Impala
  • Cloudera Data Engineering service providing Apache Spark
  • The CDP Shared Data Experience (SDX) for compliance and self-service data access for all users with consistent security and governance.
  • Hive metastore, which plays a lightweight role in providing the Catalog as described below
  • Data Visualization to visualize Iceberg data

Supported ACID transaction properties

Iceberg supports atomic and isolated database transaction properties. Writers work in isolation, not affecting the live table, and perform a metadata swap only when the write is complete, making the changes in one atomic commit.

Iceberg uses snapshots to guarantee isolated reads and writes. You see a consistent version of table data without locking the table. Readers always see a consistent version of the data without the need to lock the table. Writers work in isolation, not affecting the live table, and perform a metadata swap only when the write is complete, making the changes in one atomic commit.

Iceberg partitioning

The Iceberg partitioning technique has performance advantages over conventional partitioning, such as Apache Hive partitioning. Iceberg hidden partitioning is easier to use. Iceberg supports in-place partition evolution; to change a partition, you do not rewrite the entire table to add a new partition column, and queries do not need to be rewritten for the updated table. Iceberg continuously gathers data statistics, which supports additional optimizations, such as partition pruning.

Iceberg uses multiple layers of metadata files to find and prune data. Hive and Impala keep track of data at the folder level and not at the file level, performing file list operations when working with data in a table. Performance problems occur during the execution of multiple list operations. Iceberg keeps track of a complete list of files within a table using a persistent tree structure. Changes to an Iceberg table use an atomic object/file level commit to update the path to a new snapshot file. The snapshot points to the individual data files through manifest files.

The manifest files in the diagram below track several data files across many partitions. These files store partition information and column metrics for each data file. A manifest list is an additional index for pruning entire manifests. File pruning increases efficiency.

Iceberg relieves Hive metastore (HMS) pressure by storing partition information in metadata files on the file system/object store instead of within the HMS. This architecture supports rapid scaling without performance hits.