Hive query history service

The query history service in Hive provides a scalable solution for storing and querying historical query information in a structured and performant manner, enabling long-term analysis and monitoring.

Cloudera Data Warehouse provides you the option to enable logging Hive queries on an existing Virtual Warehouse or while creating a new Hive Virtual Warehouse.The query history service in Hive is a feature that stores a long-term record of finished queries and their associated metrics. It is designed to support auditing, debugging, and performance monitoring at scale by persisting historical query data in a modern table format.

Hive already offers several ways to inspect query activity, including:

Hive history .txt files.
Protobuf logging hook.
Live queries on the HiveServer2 Web UI.
SHOW PROCESSLIST command.
In-development query history service.

While these options allow inspection of active or recent queries, none provide a scalable solution for storing and querying historical query information in a structured and performant manner.

The query history service addresses this limitation by persisting structured records for completed queries, enabling long-term analysis using standard SQL.

Purpose of the query history table

The query history table stores the following information for each finished query:

Submitting user
Query runtime
Tables accessed
Errors
Additional metadata fields

This information is stored in a structured format using the Iceberg table format, allowing efficient querying and future integration with tools such as Apache Hue or custom dashboards.

Scope and Limitations

The query history service runs as part of HiveServer2 and writes query data to an Iceberg table in batches, using configurable memory buffering and flushing strategies. However, it is not intended for real-time query inspection, query debugging or recommendations, or for creating user interfaces or visualizations.