Initiating automatic compaction in Cloudera Manager

Several properties in the Hive and Hive metastore service configurations must be set to enable automatic compaction. You need to check that the property settings are correct and to add one of the properties to the Hive on Tez service. Automatic compaction will then occur at regular intervals, but only if necessary.

Initiator threads should run in only one Hive Metastore server (even in high-availability / HA configurations). The Hive metastore instances will elect a single leader among themselves. There is no need to override the hive.compactor.initiator.on on the Hive metastore instance level. For more information, see Hive Metastore leader election.

Disable Initiator threads in all the Datahub clusters' Hive service, the compaction initiator thread can be run by the leader HMS in the DataLake cluster. Set the following properties:

  • In Hive metastore (Hive-1) service:
    • hive.compactor.initiator.on = true (default)
  • In Hive on Tez service:
    • hive.compactor.worker.threads = <a value greater than 0> (default and recommended value = 5)
    • hive.metastore.runworker.in = hs2 (default)
Tables or partitions you are compacting must be full ACID or insert-only ACID tables.
  1. In Cloudera Manager, select the Hive metastore service: Clusters > Hive-1 > Configuration.
  2. Search for compact.
  3. Check that Turn on Compactor Initiator Thread (hive.compactor.initiator.on), Number of Threads Used by Compactor (hive.compactor.worker.threads), and Run Compactor on Hive Metastore or HiveServer2 (hive.metastore.runworker.in) are set to the values shown above.
  4. Save the changes.
  5. In Cloudera Manager, select the Hive on Tez service: Clusters > HIVE_ON_TEZ-1 > Configuration.
  6. Search for compact.
  7. Check that the Number of Threads Used by Compactor (hive.compactor.worker.threads), and Run compactor on Hive Metastore or HiveServer2 (hive.metastore.runworker.in) is set to hs2.
  8. Save the changes and restart the Hive on Tez and Hive (HIVE-1) metastore services at an appropriate time.