Query history table format
Learn about the query history table format, which provides detailed information on query execution, including column names, data types, and descriptions.
Table Format
The following table provides an overview of the columns in the query history table,
including their names, data types, and descriptions:
Column Name | Data type | Description |
---|---|---|
query_history_schema_version | int | Schema version of the query history record. |
hive_version | string | Hive version running in HiveServer2 when the query was executed. |
query_id | string | Hive-assigned identifier for the query. |
session_id | string | Hive-assigned identifier for the session. |
operation_id | string | Hive-assigned identifier for the operation. |
execution_engine | string | Execution engine used to run the query (typically Tez). |
execution_mode | string | Indicates whether the query ran in LLAP or Tez container mode. |
tez_dag_id | string | Tez DAG ID used for the query, if applicable. |
tez_app_id | string | Tez application ID used for the query, if applicable. |
tez_session_id | string | Tez session ID used for the query, if applicable. |
cluster_id | string | Unique identifier for the cluster instance. |
sql | string | SQL query string submitted by the user. |
session_type | string | Session type (HIVESERVER2 or OTHER). |
hiveserver2_protocol_version | int | Protocol version used by the client, as defined in TCLIService. |
cluster_user | string | Effective user on the cluster (typically hive if doAs is disabled). |
end_user | string | Authenticated client username. |
db_name | string | Database selected when the query was run. |
tez_coordinator | string | Address of the Tez coordinator used for the query. |
query_state | string | Query state as an OperationState value. |
query_type | string | Query type identified by the semantic analyzer (DQL, DDL, DML, DCL, STATS, or empty). |
operation | string | Hive operation name based on syntax or semantic analysis. |
server_address | string | Address of the HiveServer2 instance the client connected to. |
server_port | int | TCP port of the HiveServer2 instance. |
client_address | string | IP address of the client that initiated the connection. |
start_time_utc | timestamp | UTC timestamp when the query started. |
end_time_utc | timestamp | UTC timestamp when the query finished. |
start_time | timestamp | Server-local timestamp when the query started. |
end_time | timestamp | Server-local timestamp when the query finished. |
total_time_ms | bigint | Total execution time in milliseconds. |
planning_duration | bigint | Time spent in query compilation and planning (ms). |
planning_start_time | timestamp | Timestamp when query planning started. |
prepare_plan_duration | bigint | Time spent preparing the DAG for execution (ms). |
prepare_plan_start_time | timestamp | Timestamp when the DAG preparation started. |
get_session_duration | bigint | Time taken to acquire a Tez session (ms). |
get_session_start_time | timestamp | Timestamp when session acquisition started. |
execution_duration | bigint | Duration of DAG execution (ms). |
execution_start_time | timestamp | Timestamp when DAG execution started. |
failure_reason | string | Error message for query failure, if any. |
num_rows_fetched | int | Number of rows fetched by the query. |
plan | string | Full text of the query plan. |
used_tables | string | Comma-separated list of tables used by the query. |
exec_summary | string | Full text of the execution summary. |
configuration_options_changed | string | Configuration options changed during the query session. |
total_tasks | int | Total number of Tez tasks started for the query. |
succeeded_tasks | int | Number of successful tasks. |
killed_tasks | int | Number of killed tasks. |
failed_tasks | int | Number of failed task attempts. |
task_duration_millis | bigint | Total task execution time in milliseconds. |
node_used_count | int | Number of nodes used to run the query. |
node_total_count | int | Total number of nodes visible during query execution. |
reduce_input_groups | bigint | Number of reducer input groups. |
reduce_input_records | bigint | Number of input records seen by reducers. |
spilled_records | bigint | Number of records spilled during shuffle. |
num_shuffled_inputs | bigint | Number of physical inputs used during shuffle. |
num_failed_shuffle_inputs | bigint | Number of failed attempts to fetch shuffle inputs. |
input_records_processed | bigint | Number of input records actually processed. |
input_split_length_bytes | bigint | Total size (bytes) of input splits. |
output_records | bigint | Number of output records from all vertices. |
output_bytes_physical | bigint | Actual bytes written, including compression. |
shuffle_chunk_count | bigint | Number of shuffled files processed. |
shuffle_bytes | bigint | Total shuffled data size in bytes. |
shuffle_bytes_disk_direct | bigint | Bytes shuffled using direct disk access. |
shuffle_phase_time | bigint | Time spent in the shuffle phase (ms). |
merge_phase_time | bigint | Time spent merging shuffled data (ms). |