Databricks - Supporting Lineage via Unity Catalog

Learn how to configure the necessary permissions and connection settings in Databricks for Cloudera Octopai integration using Unity Catalog.

This guide outlines the process of configuring the necessary permissions and connection settings in Databricks for Cloudera Octopai integration. Please follow the steps below to create a dedicated service principal, grant permissions to the required system tables, and retrieve the connection details.

  • Ensure you have a Databricks cluster type that supports Unity Catalog.
  • Confirm that you have Admin permissions in Databricks to view and manage system tables and access control.
  1. Configure Permissions in Databricks
    1. Create a Dedicated Service Principal
      1. In the Databricks workspace, navigate to Settings (top-right corner).
      2. Go to Identity and Access > Service Principals
      3. Click Manage, then select Add Service Principal.
      4. Choose Databricks Managed and assign a descriptive name (e.g., octopai).
      5. Open the created service principal → Configurations tab → Select Databricks SQL Access and Workspace Access.
  2. Grant Permissions to Lineage System Table
    • Option 1: M2M Authentication (Service Principal)

      1. In Databricks, navigate to the Secrets tab.
      2. Click Generate Secret to create a new token for the service principal.
      3. Set the maximum lifespan for the token (note that this token must be regenerated periodically, as indicated in the UI).
      4. After generation, securely save the following details:
        • Secret Token
        • Client ID
    • Option 2: User Authentication Token

      1. In Databricks, navigate to Settings > Developer > Access Tokens (Manage)
      2. Click Generate New Token.
      3. Set the maximum lifespan for the token (note that this token must be regenerated periodically, as indicated in the UI).
  3. Grant Permissions to Lineage System Tables
    1. Open the Catalog in Databricks.
    2. Search for the following tables:
      • Catalog: system
      • Schema: access
      • Tables:
        • column_lineage
        • table_lineage
    3. For each table:
      1. Open the Permissions tab.
      2. Click Grant.
      3. Select the service principal created earlier.
      4. Enable Select Permission.


  4. Retrieve Connection Details
    1. Create or Locate an SQL Warehouse
      1. Go to the SQL Warehouses tab.
      2. If none exist, click Create SQL Warehouse and configure it as needed.
      3. Assign the service principal Manager permissions on the warehouse (Can Use).
      4. Open the SQL Warehouse and navigate to Connection Details.
      5. Copy the HTTP Path - this will be used for integration.
  5. Download the ODBC Driver
  6. Final step - Setting up Databricks Metadata Source

    Cloudera Octopai supports two authentication methods for connecting to Databricks:

    • User Authentication using a Personal Access Token
    • Machine-to-Machine (M2M) authentication using a Service Principal
    • Option 1: User Authentication (Personal Access Token)

      1. Connection Name

        Assign a clear and meaningful name for the connection. This name will appear to users within the Cloudera Octopai platform.

      2. Databricks Server URL

        Enter the customer's Databricks workspace URL.

        Example: https://abc-1234.5.azuredatabricks.net

      3. HTTP Path

        Paste the HTTP Path copied from the Databricks SQL Warehouse → Connection Details section.

        Example: /sql/1.0/warehouses/abc123xyz

      4. Token

        Enter the Personal Access Token generated under Settings > Developer > Access Tokens (Manage) in Databricks.

    • Option 2: M2M Authentication (Service Principal)

      1. Connection Name

        Assign a clear and meaningful name for the connection. This name will appear to users within the Cloudera Octopai platform.

      2. Databricks Server URL

        Enter the customer's Databricks workspace URL.

        Example: https://abc-1234.5.azuredatabricks.net

      3. HTTP Path

        Paste the HTTP Path copied from the Databricks SQL Warehouse → Connection Details section.

        Example: /sql/1.0/warehouses/abc123xyz

      4. Client ID

        Enter the Client ID of the service principal created in Databricks.

      5. Client Secret

        Enter the Secret Token generated for the service principal.

You have successfully configured Databricks Unity Catalog for Cloudera Octopai lineage extraction. You should now have:

  • A service principal or user authentication token
  • Permissions granted to system lineage tables
  • SQL Warehouse HTTP Path for connection
  • ODBC driver installed