Compactor processes
These background processes run inside the metastore and HiveServer2 in Cloudera Data Warehouse (CDW) Public Cloud. They support the data modifications made as a result of ACID transactions.
Compactor process | Description |
---|---|
Initiator |
This process runs in the metastore, which equates to the Database Catalog construct in the CDW UI, and discovers which tables and partitions are due for compaction. By default, it runs every 5 minutes. To change this interval:
|
Worker | This process runs in HiveServer2, which equates to the Hive Virtual Warehouse construct in the CDW UI. The worker process performs the actual compacting work. In CDW, compaction runs an INSERT statement created from the output of a SELECT statement, thereby re-writing the data to new base or delta files. |
Cleaner | This process runs in the metastore and deletes delta files after compaction and after it determines the files are no longer needed. By default, the cleaner runs every 5 seconds (5,000 milliseconds). The check occurs on the visibility ID/transaction ID, which is a global transaction identifier. |