Compactor processes

These background processes run inside the metastore and HiveServer2 in Cloudera Data Warehouse (CDW) Public Cloud. They support the data modifications made as a result of ACID transactions.

Compactor process Description

This process runs in the metastore, which equates to the Database Catalog construct in the CDW UI, and discovers which tables and partitions are due for compaction. By default, it runs every 5 minutes. To change this interval:

  1. Identify the Database Catalog for the Virtual Warehouse on which you want to change the compaction interval by selecting the Virtual Warehouse tile. The associated Database Catalog is highlighted.
  2. In the Database Catalog, click the edit icon in the tile to launch the Database Catalog details page.
  3. On the Database Catalog details page, make sure the CONFIGURATIONS tab is selected, and then select the Metastore subtab.
  4. On the Metastore subtab, select hive-site from the drop-down list on the left, and search for the hive.compactor.check.interval KEY.
  5. Add your preferred check interval in the associated VALUE field in seconds.
  6. Click APPLY in the upper right corner of the page to apply your changes. The services are automatically updated with the new configuration.
Worker This process runs in HiveServer2, which equates to the Hive Virtual Warehouse construct in the CDW UI. The worker process performs the actual compacting work. In CDW, compaction runs an INSERT statement created from the output of a SELECT statement, thereby re-writing the data to new base or delta files.
Cleaner This process runs in the metastore and deletes delta files after compaction and after it determines the files are no longer needed. By default, the cleaner runs every 5 seconds (5,000 milliseconds). The check occurs on the visibility ID/transaction ID, which is a global transaction identifier.