Trino Connector for Cloudera Octopai

Learn about Trino as a metadata source for Cloudera Octopai, including authentication options and extracted metadata.

Overview

The Trino connector connects Cloudera Octopai to a Trino coordinator and collects metadata for lineage and discovery. The connector extracts metadata and configuration properties only. It does not read, query, or export table data.

Prerequisites

Before configuring the connector, verify the following requirements.

Access requirements

Ensure you have the following:

  • Read access to the Trino environment.
  • Network connectivity from the Cloudera Octopai Client to the Trino coordinator.
  • Trino host, port, and authentication credentials.

Catalog properties access

All connector settings must be defined in Trino catalog property files, typically located in etc/catalog/.

If you do not have access to these files or do not know their location, contact your Trino administrator for the connection details.

Kerberos requirements (if applicable)

For Kerberos authentication, you must obtain the following from your Trino administrator:

  • Kerberos principal
  • Keytab file
  • Path to the kinit executable (KinitPath)

Define the KinitPath value in the kerberos.settings.json file.

Supported authentication methods

Cloudera Octopai supports the following Trino authentication types:

  • LDAP (username and password)
  • Kerberos (SPNEGO / Negotiate)

Connection parameters

The required connection parameters depend on your authentication method.

LDAP authentication

Parameter Description
Host Trino coordinator hostname or IP (example: trino.example.com)
Port Trino coordinator port (example: 443 for HTTPS)
Username Trino user identity
Password Trino user password
CatalogPropertiesPath Path to directory containing Trino catalog properties files

The extractor reads catalog property files and exports non-sensitive properties only. Passwords, tokens, and keys are filtered automatically.

Figure 1. LDAP authentication configuration
LDAP authentication configuration fields for the Trino connector

Kerberos authentication

Parameter Description
Host Trino coordinator hostname or IP
Port Trino coordinator port
KerberosPrincipal Kerberos principal used to obtain the ticket
KeytabPath Path to the keytab file
CatalogPropertiesPath Path to directory containing Trino catalog properties files
Figure 2. Kerberos authentication configuration
Kerberos authentication configuration fields for the Trino connector

Kerberos ticket behavior

The connector attempts to use an existing Kerberos ticket from the cache. If the ticket is expired or invalid, the extractor runs the following command to obtain a new ticket before connecting:

kinit -k -t [***KEYTAB PATH***] [***KERBEROS PRINCIPAL***]

Catalog property handling

The extractor scans catalog properties files and exports only non-sensitive configuration values. Sensitive keys are automatically removed, including:

  • Passwords
  • Tokens
  • Encryption keys
  • SSL credentials
  • Cloud access credentials

If a catalog references a platform not supported by Cloudera Octopai, the corresponding objects appear as inferred objects in the lineage.

Supported connectors

Cloudera Octopai provides complete lineage only when Trino catalog properties use the required configuration formats listed in the following table.

Platform Connector Name Required Configuration
SQL Server sqlserver jdbc:sqlserver://[***SERVER***]:1433;databaseName=[***DATABASE NAME***];encrypt=false;trustServerCertificate=true
Snowflake snowflake jdbc:snowflake://[***SERVER***]:443?db=[***DATABASE NAME***]
PostgreSQL postgresql jdbc:postgresql://[***SERVER***]:5432/[***DATABASE NAME***]
MySQL mysql jdbc:mysql://[***SERVER***]:3306?useSSL=true&requireSSL=true&verifyServerCertificate=false
Hive hive hive.metastore.uri=thrift://[***HOST***]:9083
Google BigQuery bigquery bigquery.project-id=[***YOUR GCP PROJECT ID***]
Oracle oracle jdbc:oracle:thin:@[***HOST***]:[***PORT***]:[***SID***]
Redshift redshift jdbc:redshift://[***HOST***]:[***PORT***]/[***DATABASE***]
Iceberg (Hive) iceberg hive.metastore.uri=thrift://[***HOST***]:9083
MariaDB mariadb jdbc:mariadb://[***HOST***]:[***PORT***]
Kafka kafka kafka.table-names, kafka.nodes, kafka.config.resources
MongoDB mongodb mongodb.connection-url=mongodb://[***USER***]:[***PASS***]@[***HOST***]:[***PORT***]/
Delta Lake (Thrift) delta_lake hive.metastore.uri=thrift://[***HOST***]:9083
Delta Lake (Glue) delta_lake hive.metastore=glue
Druid druid jdbc:avatica:remote:url=http://[***BROKER***]:[***PORT***]/druid/v2/sql/avatica/

Metadata extracted

The Cloudera Octopai agent queries Trino system metadata and exports the following objects:

Catalogs

From system.metadata.catalogs:

  • Catalog name (the system catalog is excluded)
Catalog properties

The extractor scans catalog properties files and exports non-sensitive properties only.

Tables

From system.jdbc.tables:

  • Catalog
  • Schema
  • Table name
  • Table type
  • Table comments

The following system schemas are excluded: information_schema, jdbc, runtime, metadata.

Views

From [***CATALOG***].information_schema.views:

  • Catalog
  • Schema
  • View name
  • View definition
Columns

From [***CATALOG***].information_schema.columns:

  • Catalog
  • Schema
  • Table
  • Column name
  • Ordinal position
  • Default value
  • Nullable flag
  • Data type

Results are ordered by schema, table, and ordinal position.

Comments

From system.metadata.table_comments:

  • Catalog
  • Schema
  • Table
  • Comment text

Empty comments and system schemas are excluded.

Extraction output

The connector generates XML output files in the extractor temporary folder. Files are created for each extraction type:

  • Catalogs
  • Tables (per catalog)
  • Views (per catalog)
  • Columns (per catalog)
  • Comments (per catalog)
  • Catalog properties

Cloudera Octopai uses these files to build lineage and discovery maps.