Options to monitor compactions
You can view the progress of compactions using the
statement or by querying the
COMPACTIONS view within the
The query statement displays the following details about compactions:
C_ID: Unique internal compaction ID
C_DATABASE: Hive database name
C_TABLE: Table name
C_PARTITION: Partition name (if the table is partitioned)
C_TYPE: Major or minor compaction
C_STATE: Compaction state
- initiated: waiting in queue to be compacted
- working: currently being compacted
- ready for cleaning: compaction completed and old files scheduled for removal
- failed: compaction job failed. Details are printed to the metastore log.
- succeeded: compaction job completed successfully
- attempted: initiator attempted to schedule a compaction but failed or
hive.compactor.initiator.failed.compacts.thresholdis breached and compaction is skipped . Details are printed to the metastore log.
C_WORKER_HOST: Thread ID of the worker thread performing the compaction
C_WORKER_ID: Unique internal worker ID
C_WORKER_VERSION: Version of distribution with running worker
C_ENQUEUE_TIME: Time when compaction request was placed
C_START: Start time of the compaction job
C_DURATION: Duration (in ms) of the compaction job
C_HADOOP_JOB_ID: ID of the submitted MapReduce job for MR compaction. 'None' for query-based compaction.
C_INITIATOR_HOST: Host where initiator is configured
C_INITIATOR_ID: Unique internal initiator ID
C_INITIATOR_VERSION: Version of distribution with active initiator
The following sections describe the various options that you can use to monitor compactions.
You can run the
SHOW COMPACTIONS statement to view details about all the
Authorization is not required to use the
SHOW COMPACTIONS statement.
Therefore, every user can view compactions and their current state.
Since the statement lists all the compaction jobs, you cannot filter or limit the results as
required. Alternatively, you can use the
SYS database to query and filter
Querying the SYS database
SYS database in the Hive metastore contains the following sources that
you can use to monitor compactions:
COMPACTION_QUEUE: Stores information about compaction requests - both explicit user submitted and compactions discovered by the initiator and enqueued.
COMPLETED_COMPACTIONS: Stores the same information present in
COMPACTION_QUEUEalong with the compaction’s end time. Records are moved to the completed table when the cleaner finishes the job on the compaction request.
COMPACTIONS: A view over the
COMPLETED_COMPACTIONStables that resolves the column values to human readable format.
SYS database tables and views are treated as normal Hive external tables
or views, and therefore, standard access policies can be configured against these sources.
The following examples illustrate how you can run queries on the
SYS.COMPACTIONS view to monitor compactions:
- Query to display the last 10 failed compactions
SELECT * FROM SYS.COMPACTIONS WHERE C_STATE='failed' ORDERBY C_ID DESC LIMIT 10;
- Query to check the status of a specific compaction using the compaction ID
SELECT * FROM SYS.COMPACTIONS WHERE C_ID='1234';
- Query to view the total number of compactions in a particular state
SELECT COUNT(*) FROM SYS.COMPACTIONS WHERE C_STATE='ready for cleaning';