How Cloudera Data Sharing works
Cloudera Data Sharing enables your clients to run their workloads from data platforms, such as Databricks or Snowflake to fetch data from Cloudera environments for analytical purposes.
Cloudera Data Sharing involves the creation of a Data Share with the necessary authorization and authentication mechanisms. A Data Share is an organizational unit of data, a collection of data assets. You can then share this data with your clients so that they can access the Iceberg table data created within the Cloudera environment.
The following sections describe the high-level workflow of the processes involved in Cloudera Data Sharing.
Data Share creation
The following tasks are part of the Data Share creation process:
-
As a resource owner, use the existing Knox Token Management system to generate a token. This unique Token ID is referred to as the
CLIENT_ID
and the generated passcode is theCLIENT_SECRET
. -
As part of the token generation process, a Ranger role and Ranger group are created. This group is a virtual group that Knox provides for the client with whom the data is shared.
-
Create and maintain policies for the set of databases and tables to be shared for the Ranger role and group and thereby create a Data Share.
-
Maintain the
SELECT
permission for the databases or tables to allowREAD-only
access. -
You can then share the
CLIENT_ID
and theCLIENT_SECRET
with your clients so that they can access the shared data in the Cloudera environment.
Data Share access
After you have created a Data Share, which includes creating tokens, authoring a read-only
policy within Ranger, and shared the CLIENT_ID
and
CLIENT_SECRET
with your client, the client makes use of these credentials in
their workloads to establish a handshake with Cloudera and to
access the shared data.