Cloudera Iceberg REST Catalog

Cloudera Iceberg REST Catalog is a server-side implementation based on Apache Iceberg REST Catalog Open API, enabling REST-enabled third-party tools to manage Iceberg table metadata.

Apache Iceberg is a table format for huge analytics datasets in the cloud that defines how metadata is stored and data files are organized. Iceberg is also a library that compute engines can use to read or write a table.

Cloudera supports a data lakehouse architecture by pre-integrating and unifying the capabilities of Data Warehouses and Data Lakes to support data engineering, business intelligence, and machine learning—all on a single platform. Cloudera’s support for an open data lakehouse brings high-performance, self-service reporting and analytics to your business, simplifying data management for both data practitioners and administrators. Cloudera's open data lakehouse is built on Apache Iceberg, which makes it easy to manage operational metadata.

Table operations like creating, dropping, or renaming tables are handled by a catalog. An Iceberg catalog helps query engines to manage and organize collections of tables, which are usually grouped into namespaces. Iceberg catalog implementations can be of various types, such as REST, HiveCatalog, JDBC, HadoopCatalog, Nessie, and so on. For more information about Iceberg catalogs, see the Apache Iceberg Catalogs documentation.

Cloudera supports the Iceberg REST catalog server implementation, which enables the server to expose Iceberg table metadata to REST-enabled third-party compute engines and tools like Spark, Trino, Snowflake, AWS Athena, AWS EMR Spark, and Databricks.

Cloudera Iceberg REST Catalog is a server-side catalog based on the Apache Iceberg REST Catalog Open API specification and is exposed through REST APIs to manage Iceberg tables. The REST catalog provides API endpoints to perform table management tasks, such as creating, listing, updating, or deleting tables. It also allows users to access and manage Iceberg table metadata.

As part of the current offering, the Iceberg REST Catalog service is provided as an embedded service that runs within the same Java Virtual Machine (JVM) as the Hive Metastore (HMS). The scope of the metadata that the catalog can serve is limited to what the host HMS can serve. The REST catalog can be made available to third-party engines by using the OAuth authentication mechanism defined by the Knox Token management system, and you can use Apache Ranger policies to govern the Iceberg data that is accessed.

The REST service is deployed within the HMS through an embedded jetty engine. The configuration for the REST Catalog server is within the host HMS, and any changes made to the configurations will affect the REST service.