Changing compactor configuration for Hive Virtual Warehouses on Cloudera Data Warehouseon premises
To enhance performance, the compactor is a set of background processes that compact
delta files, which are created as a by-product of data modifications. When it runs, it incurs
additional load on the Hive Virtual Warehouse assigned as the compactor in Cloudera Data Warehouseon premises. You can change which Hive warehouse performs
compaction to load-balance this workload as necessary.
In Cloudera Data Warehouseon premises, data compaction is performed on HiveServer2, which
equates to the Hive Virtual Warehouse construct in the UI. This means that compaction is
essentially query execution. Compaction runs an INSERT statement created from the output of a
SELECT statement and runs in the Hive Virtual Warehouse assigned as the compactor, thereby
re-writing the data. The Hive Virtual Warehouse, configured as the compactor, delivers the
query capacity to perform this. Therefore, when you size the Hive Virtual Warehouse that
performs compaction, you must take into consideration the extra workload to run the compaction
queries. That extra workload needs to be considered in addition to your other query workloads
on the Hive Virtual Warehouse that is configured as the compactor.One of the Hive Virtual Warehouses must be configured as the compactor for the
associated Database Catalog (excluding the default Database Catalog whose commpaction is
performed on Cloudera Base on premises). This Hive Virtual
Warehouse compactor runs all of the compaction queries for all Virtual Warehouses that use one
particular Database Catalog, including Impala Virtual Warehouses. However, Impala Virtual
Warehouses cannot be configured as the compactor Virtual Warehouse for a Database Catalog.
Compaction tasks must be assigned to a Hive Virtual Warehouse. The first Hive Virtual
Warehouse you create against a Database Catalog is automatically set as the compactor. If you
decide you do not want that particular warehouse to take on the compaction workload, you can
set another Hive Virtual Warehouse to perform the compaction workload by following these
steps:
Log in to the Cloudera web interface and navigate to
the Data Warehouse service.
On the Overview page, select the Hive Virtual Warehouse that you
want to set as the compactor, and click .