OpenTelemetry support for Impala

The Impala OpenTelemetry integration enables real-time query observability and centralized telemetry data collection, including lifecycle events and resource usage.

Overview

Impala telemetry data is integrated with OTel-compatible collectors. This provides a centralized flow of live query insights, with SELECT queries represented as OTel traces, and reduces the friction of sourcing data from multiple places.

Impala integration with OTel

Impala integrates the OTel C++ SDK to emit query lifecycle data. The system already tracks specific phases and events for each query and records them in the query profile timeline section. By emitting these events to an OTel collector, observability systems can track active queries in near real-time.

Collected telemetry data

Telemetry data emitted from Impala carries crucial information that is currently available only in the query profile and workload management tables. Telemetry data includes the following data:
  1. The initiating user
  2. The SQL statement
  3. Memory estimates and actual use
  4. Other important data related to the query lifecycle

Availability

OTel support for Impala is made effective as of the Cloudera Data Warehouse on premises 1.5.5 SP1 version. After upgrading the Cloudera Data Warehouse version, you must also upgrade existing Impala Virtual Warehouses to enable and configure the OTel integration.