Data sharing with Cloudera Iceberg REST Catalog

Cloudera Iceberg REST Catalog enables secure metadata-level access to Iceberg tables from external engines that support the Iceberg REST API. It can serve as the foundation for secure data sharing, when combined with proper access control and network configuration.

Cloudera Iceberg REST Catalog lets you share Cloudera on cloud data, specifically Iceberg tables, with external users (clients) who are outside of Cloudera environments. As a Cloudera user, you can share Iceberg table data with your clients. These clients can then access the data using third-party engines like Databricks or Snowflake that support the Iceberg REST catalog.

Cloudera Iceberg REST Catalog is implemented based on the Iceberg REST Catalog API specification. You can make data available to your clients by using the OAuth authentication mechanism defined by the Knox Token management system and using Apache Ranger policies for defining authorization rules for Iceberg tables.

Benefits of data sharing with Cloudera Iceberg REST Catalog

  • Enables cross-platform data sharing. For example, you can share data from a Cloudera data lake with clients who use other data platforms like Databricks, Amazon EMR, Snowflake, Splunk, and more.

  • Provides zero-copy Cloudera live data, meaning no replication is needed, and there is no latency.

  • Offers centralized governance of data using Apache Ranger and Atlas

  • Secure collaboration between customers and their clients

  • Creates a foundation for building a marketplace to share datasets, notebooks, and Cloudera AI models.

Scope and assumptions

  • In this release, only read access to Iceberg tables is supported.

  • Clients accessing the data can use non-Cloudera engines that understand the Iceberg table format.

  • Table access audit in Ranger is supported.

  • Currently, this feature is supported only on AWS S3 storage in Cloudera on cloud environments.

  • Ranger column masking and row-level filtering is not supported. You can restrict the access only to complete tables in a Data Share.

  • The High Availability feature is not supported for Cloudera Data Sharing powered by REST Catalog.