How compaction interacts with Cloudera Base on premises

In Cloudera Base on premises, the initiator and cleaner processes also run in the metastore as they do in Cloudera Data Warehouse on premises. However, the worker process runs in HiveServer2 as a MapReduce task so its progress can be viewed in YARN.

In Cloudera Data Warehouse, the initiator and cleaner processes run in the Database Catalog, which is the Cloudera Data Warehouse UI construct that equates to the metastore. The default Database Catalog, which is created by the system when you activate an environment in Cloudera Data Warehouse, maintains a connection with Cloudera Base on premises and all compaction jobs for the default Database Catalog run on Cloudera Base on premises. However, subsequent Database Catalogs that are created do not maintain a connection to Cloudera Base on premises and compaction runs entirely in Cloudera Data Warehouse. Also in Cloudera Data Warehouse, the worker process that performs the compaction work runs in HiveServer2, which equates to a Hive Virtual Warehouse. However, compaction performed by the worker process in Hive Virtual Warehouses consists of queries instead of MapReduce tasks.