Use Case: Tools Accessing Data Connections
Previously, tools accessing Cloudera data connections (such as CDW and Hive) relied on system environment variables or inherited authentication. For security reasons, these credentials are now blocked within the sandboxed environment.
To resolve this issue, tools are now required to explicitly accept and utilize the CDSW API v2 key, supplied as a user parameter, for authentication.
import cml.data_v1 as cmldata
def run_tool(config, args):
# ❌ This will fail - CDSW_APIV2_KEY environment variable is filtered in sandbox
connection = cmldata.get_connection(
config.hive_cai_data_connection_name,
parameters={
"USERNAME": config.workload_user,
"PASSWORD": config.workload_pass,
},
)
# ... rest of tool code
import cml.data_v1 as cmldata
import os
class UserParameters(BaseModel):
hive_cai_data_connection_name: str = Field(description="CDW connection name configured in CML")
workload_user: str = Field(description="Workload username for CDW")
workload_pass: str = Field(description="Workload password for CDW")
cdsw_api_v2_key: Optional[str] = Field(default=None, description="CDSW API v2 key for authentication")
def run_tool(config: UserParameters, args: ToolParameters) -> Any:
# ✅ Set CDSW API v2 key authentication if provided
if config.cdsw_api_v2_key:
# Update the session authentication in cmldata data module
cmldata.data.session.auth = (config.cdsw_api_v2_key, "")
# Also set environment variable for consistency
os.environ["CDSW_APIV2_KEY"] = config.cdsw_api_v2_key
# Now connection will work with proper authentication
connection = cmldata.get_connection(
config.hive_cai_data_connection_name,
parameters={
"USERNAME": config.workload_user,
"PASSWORD": config.workload_pass,
},
)
cursor = connection.get_cursor()
# ... rest of tool code
- The cdsw_api_v2_key is now an optional parameter within UserParameters.
-
Authentication must be completed before calling get_connection().
-
Users are required to provide their CDSW API v2 key during the tool configuration process.
cdw_hive_database_read_tool - Can execute SQL queries to Hive
databases
- Secure Connectivity: Establishing connections to CDW Hive databases using CML data connections, with authentication managed securely via the CDSW API v2 key.
- Sandboxed Tool Execution: Demonstrating how tools can execute SQL queries within sandboxed environments while maintaining a secure database connection.
- Natural Language Translation: Automatically converting user questions into corresponding SQL queries using the capabilities of a Large Language Model (LLM).
- Advanced Data Insights: Retrieving query results, performing sophisticated data analytics, and generating actionable insights through LLM-powered analysis.
The Talk to your CDW Database (Kerberos) workflow template is an extension of the standard CDW database workflow, specifically designed to incorporate Kerberos authentication. Key features demonstrated by this template include:
- Secure Authentication: Illustrates the process of securely authenticating to CDW Hive databases utilizing Kerberos credentials.
- Credential Handling: Shows effective management of Kerberos credential caches within sandboxed environments.
- Parameter Security: Details how to securely encode and transmit Kerberos credentials as base64-encoded parameters.
