How Cloudera Data Sharing works

Cloudera Data Sharing enables your clients to run their workloads from data platforms, such as Databricks or Snowflake to fetch data from Cloudera environments for analytical purposes.

Cloudera Data Sharing involves the creation of a Data Share with the necessary authorization and authentication mechanisms. A Data Share is an organizational unit of data, a collection of data assets. You can then share this data with your clients so that they can access the Iceberg table data created within the Cloudera environment.

The following sections describe the high-level workflow of the processes involved in Cloudera Data Sharing.

Data Share creation

The following tasks are part of the Data Share creation process:

  • As a resource owner, use the existing Knox Token Management system to generate a token. This unique Token ID is referred to as the CLIENT_ID and the generated passcode is the CLIENT_SECRET.

  • As part of the token generation process, a Ranger role and Ranger group are created. This group is a virtual group that Knox provides for the client with whom the data is shared.

  • Create and maintain policies for the set of databases and tables to be shared for the Ranger role and group and thereby create a Data Share.

  • Maintain the SELECT permission for the databases or tables to allow READ-only access.

  • You can then share the CLIENT_ID and the CLIENT_SECRET with your clients so that they can access the shared data in the Cloudera environment.

Data Share access

After you have created a Data Share, which includes creating tokens, authoring a read-only policy within Ranger, and shared the CLIENT_ID and CLIENT_SECRET with your client, the client makes use of these credentials in their workloads to establish a handshake with Cloudera and to access the shared data.