Data sharing with Cloudera Iceberg REST Catalog

Cloudera Iceberg REST Catalog enables secure metadata-level access to Iceberg tables from external engines that support the Iceberg REST API. It can serve as the foundation for secure data sharing, when combined with proper access control and network configuration.

Cloudera Iceberg REST Catalog allows you to share Cloudera on cloud data, specifically Iceberg tables, with external users (clients) who are outside of Cloudera environments. As a Cloudera user, you can share Iceberg table data with your clients. These clients can then access the data using third-party engines such as Databricks or Snowflake that support the Iceberg REST catalog.

Cloudera Iceberg REST Catalog is implemented based on the Iceberg REST Catalog API specification. You can make data available to your clients by using the OAuth authentication mechanism defined by the Knox Token management system and using Apache Ranger policies for defining authorization rules for Iceberg tables.

Benefits of data sharing with Cloudera Iceberg REST Catalog

Sharing data through the Cloudera Iceberg REST Catalog provides the following architectural and operational benefits:

  • Enables cross-platform data sharing. For example, you can share data from a Cloudera data lake with clients who use other data platforms such as Databricks, Amazon EMR, Snowflake, or Splunk.

  • Provides zero-copy Cloudera live data to eliminate the need for data replication and remove the latency associated with traditional data movement.

  • Offers centralized governance of data using Apache Ranger and Atlas

  • Facilitates secure collaboration between organizations and their clients

  • Creates a foundation for building a marketplace to share datasets, notebooks, and Cloudera AI models.

Functional scope and limitations

  • Only read access to Iceberg tables is supported.

  • Clients accessing the data can use non-Cloudera engines that understand the Iceberg table format.

  • Table access audit in Ranger is supported.

  • Data Sharing with Cloudera Data Catalog (both with user interface and CDP CLI) is only supported on AWS S3 storage in Cloudera on cloud environments.

    Data Sharing with Cloudera Iceberg REST Catalog is supported for both AWS S3 and Azure based Cloudera on cloud.

  • Ranger column masking and row-level filtering is not supported. You can restrict the access only to complete tables in a Data Share.