Databricks - Supporting Lineage through Unity Catalog and Real-time Lineage for Specific
Notebooks
This guide provides instructions for Cloudera Octopai administrators on setting up metadata
extraction from Databricks to build data lineage within Cloudera Octopai. You have two options depending
on your needs. You can either enable lineage through Unity Catalog using the Cloudera Octopai Data Lineage Client
extraction or apply lineage for specific notebooks.
If you are enabling lineage through Unity Catalog using the Cloudera Octopai Client extraction, make
sure your Databricks environment is using a cluster type that supports Unity Catalog. This
is essential for extracting metadata using Unity Catalog.
In both cases of metadata extraction, ensure that permissions and configurations are
correctly set to maintain accurate and comprehensive data lineage within Cloudera Octopai.
To set up the permissions, choose one of the following options and perform the steps for
each option:
Option 1: Supporting Lineage through Unity Catalog Using Cloudera Octopai
Client Extraction
Ensure you have the correct cluster type.
Make sure your Databricks environment is using a cluster type that supports Unity
Catalog. This is essential for extracting metadata using Unity Catalog.
Configure permissions in Databricks.
Proper permissions are crucial for allowing Cloudera Octopai to access and extract metadata
from your Databricks environment.
Locate the workspace:
Locate the cluster that holds the metadata you want to extract.
Open your Databricks workspace with admin privileges.
Manage permissions:
Navigate to the permissions settings in your Databricks workspace.
Add users or groups that require access to this metadata.
Open the permissions dialog and select Sharing
permissions.
Assign permissions:
Add individual users or groups to grant them notebook permissions.
Select Add user or Add
group.
Choose the user or group from the dropdown list.
Assign the appropriate permission level, such as Can
view, Can run, Can
edit, or Is owner.
It's advisable to add a group and set the permission to Can
manage.
Save and verify:
Save your changes and confirm that all permissions are correctly set.
Reopen the sharing permissions dialog to review the configured access.
Option 2: Building Lineage for Specific Notebooks
Identify the notebooks for lineage.
Determine which specific notebooks within your Databricks environment should be
included in the data lineage.
Configure permissions for the selected notebooks.
Access the notebook workspace:
Locate the notebook or notebooks you plan to include in the lineage.
Open the corresponding Databricks workspace with admin privileges.
Manage permissions:
Navigate to the permissions settings.
Add the users or groups that require access.
Select Sharing permissions to open the permissions
dialog.
Assign permissions:
Add the required users or groups through the sharing dialog.
Choose the appropriate entity from the dropdown list.
Assign a permission level such as Can view, Can run, Can edit, or Is owner.
Save and verify:
Save the permissions settings and double-check the configuration for
accuracy.
Set up the Databricks Metadata Source.
Assign a meaningful name for the connection as it will appear to users on the
Cloudera Octopai platform.