Use Case: Tools Accessing Data Connections

Previously, tools accessing Cloudera data connections (such as CDW and Hive) relied on system environment variables or inherited authentication. For security reasons, these credentials are now blocked within the sandboxed environment.

To resolve this issue, tools are now required to explicitly accept and utilize the CDSW API v2 key, supplied as a user parameter, for authentication.

Old Approach (Now Obsolete):
import cml.data_v1 as cmldata

def run_tool(config, args):
    # ❌ This will fail - CDSW_APIV2_KEY environment variable is filtered in sandbox
    connection = cmldata.get_connection(
        config.hive_cai_data_connection_name,
        parameters={
            "USERNAME": config.workload_user,
            "PASSWORD": config.workload_pass,
        },
    )
    # ... rest of tool code
After (New Approach - Required):
import cml.data_v1 as cmldata
import os

class UserParameters(BaseModel):
    hive_cai_data_connection_name: str = Field(description="CDW connection name configured in CML")
    workload_user: str = Field(description="Workload username for CDW")
    workload_pass: str = Field(description="Workload password for CDW")
    cdsw_api_v2_key: Optional[str] = Field(default=None, description="CDSW API v2 key for authentication")

def run_tool(config: UserParameters, args: ToolParameters) -> Any:
    # ✅ Set CDSW API v2 key authentication if provided
    if config.cdsw_api_v2_key:
        # Update the session authentication in cmldata data module
        cmldata.data.session.auth = (config.cdsw_api_v2_key, "")
        # Also set environment variable for consistency
        os.environ["CDSW_APIV2_KEY"] = config.cdsw_api_v2_key
    
    # Now connection will work with proper authentication
    connection = cmldata.get_connection(
        config.hive_cai_data_connection_name,
        parameters={
            "USERNAME": config.workload_user,
            "PASSWORD": config.workload_pass,
        },
    )
    
    cursor = connection.get_cursor()
    # ... rest of tool code
Key Changes for Tool Configuration and Authentication:
  • The cdsw_api_v2_key is now an optional parameter within UserParameters.
  • Authentication must be completed before calling get_connection().

  • Users are required to provide their CDSW API v2 key during the tool configuration process.

Example Tools Using This Pattern:
cdw_hive_database_read_tool - Can execute SQL queries to Hive databases
Example Workflow Template:
Workflow templates are available to demonstrate how to create workflows and tools. These examples illustrate connecting to your Hive data sources to execute SQL queries, retrieve information, and utilize an LLM for data analysis. The templates specifically highlight the implementation of tool sandboxing in conjunction with data connections.
The Talk to your CDW Database workflow template provides a comprehensive, end-to-end method for natural language interaction with Cloudera Data Warehouse (CDW) Hive databases. This workflow template highlights several key features:
  • Secure Connectivity: Establishing connections to CDW Hive databases using CML data connections, with authentication managed securely via the CDSW API v2 key.
  • Sandboxed Tool Execution: Demonstrating how tools can execute SQL queries within sandboxed environments while maintaining a secure database connection.
  • Natural Language Translation: Automatically converting user questions into corresponding SQL queries using the capabilities of a Large Language Model (LLM).
  • Advanced Data Insights: Retrieving query results, performing sophisticated data analytics, and generating actionable insights through LLM-powered analysis.

The Talk to your CDW Database (Kerberos) workflow template is an extension of the standard CDW database workflow, specifically designed to incorporate Kerberos authentication. Key features demonstrated by this template include:

  • Secure Authentication: Illustrates the process of securely authenticating to CDW Hive databases utilizing Kerberos credentials.
  • Credential Handling: Shows effective management of Kerberos credential caches within sandboxed environments.
  • Parameter Security: Details how to securely encode and transmit Kerberos credentials as base64-encoded parameters.