Options to monitor compactions
You can view the progress of compactions using the SHOW COMPACTIONS
statement or by querying the COMPACTIONS
view within the SYS
database.
The query statement displays the following details about compactions:
C_ID
: Unique internal compaction IDC_DATABASE
: Hive database nameC_TABLE
: Table nameC_PARTITION
: Partition name (if the table is partitioned)C_TYPE
: Major or minor compactionC_STATE
: Compaction state- initiated: waiting in queue to be compacted
- working: currently being compacted
- ready for cleaning: compaction completed and old files scheduled for removal
- failed: compaction job failed. Details are printed to the metastore log.
- succeeded: compaction job completed successfully
- attempted: initiator attempted to schedule a compaction but failed or
hive.compactor.initiator.failed.compacts.threshold
is breached and compaction is skipped . Details are printed to the metastore log.
C_WORKER_HOST
: Thread ID of the worker thread performing the compactionC_WORKER_ID
: Unique internal worker IDC_WORKER_VERSION
: Version of distribution with running workerC_ENQUEUE_TIME
: Time when compaction request was placedC_START
: Start time of the compaction jobC_DURATION
: Duration (in ms) of the compaction jobC_HADOOP_JOB_ID
: ID of the submitted MapReduce job for MR compaction. 'None' for query-based compaction.C_INITIATOR_HOST
: Host where initiator is configuredC_INITIATOR_ID
: Unique internal initiator IDC_INITIATOR_VERSION
: Version of distribution with active initiator
The following sections describe the various options that you can use to monitor compactions.
SHOW COMPACTIONS
You can run the SHOW COMPACTIONS
statement to view details about all the
compaction jobs.
SHOW COMPACTIONS;
Authorization is not required to use the SHOW COMPACTIONS
statement.
Therefore, every user can view compactions and their current state.
Since the statement lists all the compaction jobs, you cannot filter or limit the results as
required. Alternatively, you can use the SYS
database to query and filter
specific compactions.
Querying the SYS database
The SYS
database in the Hive metastore contains the following sources that
you can use to monitor compactions:
COMPACTION_QUEUE
: Stores information about compaction requests - both explicit user submitted and compactions discovered by the initiator and enqueued.COMPLETED_COMPACTIONS
: Stores the same information present inCOMPACTION_QUEUE
along with the compaction’s end time. Records are moved to the completed table when the cleaner finishes the job on the compaction request.COMPACTIONS
: A view over theCOMPACTION_QUEUE
andCOMPLETED_COMPACTIONS
tables that resolves the column values to human readable format.
The SYS
database tables and views are treated as normal Hive external tables
or views, and therefore, standard access policies can be configured against these sources.
The following examples illustrate how you can run queries on the
SYS.COMPACTIONS
view to monitor compactions:
- Query to display the last 10 failed compactions
-
SELECT * FROM SYS.COMPACTIONS WHERE C_STATE='failed' ORDERBY C_ID DESC LIMIT 10;
- Query to check the status of a specific compaction using the compaction ID
-
SELECT * FROM SYS.COMPACTIONS WHERE C_ID='1234';
- Query to view the total number of compactions in a particular state
-
SELECT COUNT(*) FROM SYS.COMPACTIONS WHERE C_STATE='ready for cleaning';