Starting compaction manually

You manually start compaction when automatic compaction fails for some reason. You can start compaction by running a Hive statement.

You can run compaction pseudo-synchronously using the AND WAIT clause. Compaction actually occurs asynchronously, but seems synchronous. The compaction request is recorded and queued, and remains in a waiting cycle, querying the status of the compaction in the background until a failure, success, or timeout occurs. The hive.compactor.wait.timeout (default: 300s) property sets the timeout.

Start compaction using a query

You use the following syntax to issue a query that starts compaction:
 ALTER TABLE tablename [PARTITION (partition_key='partition_value' [,...])] COMPACT 'compaction_type'

  • Files you are compacting must be in the ORC format.
  • Compaction must be enabled (initiator hive.compactor.initiator.on=true)
  1. Run a query to start a major compaction of a table.
    ALTER TABLE mytable COMPACT 'major'
    Use the COMPACT 'minor' clause to run a minor compaction. ALTER TABLE compacts tables even if the NO_AUTO_COMPACTION table property is set.
  2. Start compaction in a pseudo-synchronous way.
    ALTER TABLE mydb.mytable PARTITION (mypart='myval') COMPACT 'MAJOR' AND WAIT;