Cloudera Data Sharing overview
Cloudera Data Sharing allows secure sharing of Iceberg table data from Cloudera on cloud with external clients using third-party engines that support the Iceberg REST catalog.
Cloudera Data Sharing lets you share Cloudera on cloud data, specifically Iceberg tables, with external users (clients) who are outside of Cloudera environments. As a Cloudera user, you can share Iceberg table data with your clients. These clients can then access the data using third-party engines like Databricks or Snowflake that support the Iceberg REST catalog.
The REST Catalog service is implemented based on the Iceberg REST Catalog API specification. You can make Cloudera Data Sharing available to your clients by using the OAuth authentication mechanism defined by the KNOX Token management system and using Apache Ranger policies for defining the data shares for the clients.
Benefits of Cloudera Data Sharing
-
Enables cross-platform data sharing. For example, you can share data from a Cloudera data lake with clients who use other data platforms like Databricks, Amazon EMR, Snowflake, Splunk, and more.
-
Provides zero-copy Cloudera live data, meaning no replication is needed, and there is no latency.
-
Offers centralized governance of data using Apache Ranger and Atlas
-
Secure collaboration between customers and their clients
-
Creates a foundation for building a marketplace to share datasets, notebooks, and Cloudera AI models.
Scope and assumptions
-
In this release, only read access to Iceberg tables is supported.
-
Clients accessing the data can use non-Cloudera engines that understand the Iceberg table format.
-
Supports table access audit in Ranger.
-
Currently, this feature is supported only on AWS S3 storage in Cloudera on cloud environments.