Cloudera Iceberg REST Catalog overview

Cloudera Iceberg REST Catalog is a server-side implementation based on Apache Iceberg REST Catalog Open API, extended with enterprise-grade security and governance through Apache Ranger and Apache Knox, enabling REST-enabled third-party tools to manage Iceberg table metadata.

Apache Iceberg is a table format for huge analytics datasets in the cloud that defines how metadata is stored and data files are organized. Iceberg is also a library that compute engines can use to read or write a table.

Cloudera supports a data lakehouse architecture by pre-integrating and unifying the capabilities of Data Warehouses and Data Lakes to support data engineering, business intelligence, and machine learning on a single platform. Cloudera’s support for an open data lakehouse brings high-performance, self-service reporting and analytics to your business, simplifying data management for both data practitioners and administrators. Built on Apache Iceberg, the Cloudera open data lakehouse simplifies data management and operational metadata for both practitioners and administrators.

Catalog handles table operations such as creating, dropping, or renaming tables. An Iceberg catalog helps query engines to manage and organize collections of tables, which are usually grouped into namespaces. Various Iceberg catalog implementations exist, such as REST, HiveCatalog, JDBC, HadoopCatalog, and Nessie. For more information about Iceberg catalogs, see the Apache Iceberg Catalogs documentation.

Cloudera supports the Iceberg REST catalog server implementation, which enables the server to expose Iceberg table metadata to REST-enabled third-party compute engines and tools such as Spark, Trino, Snowflake, AWS Athena, AWS EMR Spark, and Databricks.

Cloudera Iceberg REST Catalog is a server-side catalog based on the Apache Iceberg REST Catalog Open API specification and is exposed through REST APIs to manage Iceberg tables. The REST catalog provides API endpoints to perform table management tasks, such as creating, listing, updating, or deleting tables. It also allows users to access and manage Iceberg table metadata.

The Iceberg REST Catalog service is provided as an embedded service that runs within the same Java Virtual Machine (JVM) as the Hive Metastore (HMS). The scope of the metadata that the catalog can serve is limited to what the host HMS can serve. The REST catalog can be made available to third-party engines by using the OAuth authentication mechanism defined by the Knox token management system, and you can use Apache Ranger policies to govern the Iceberg data that is accessed.

The REST service is deployed within the HMS through an embedded jetty engine. The configuration for the REST Catalog server is within the host HMS, and any changes made to the configurations will affect the REST service.