Changing compactor configuration for Hive Virtual Warehouses on CDW Private Cloud
To enhance performance, the compactor is a set of background processes that compact
delta files, which are created as a by-product of data modifications. When it runs, it
incurs additional load on the Hive Virtual Warehouse assigned as the compactor in Cloudera Data
Warehouse (CDW) Private Cloud. You can change which Hive warehouse performs compaction to load-balance
this workload as necessary.
In CDW Private Cloud, data compaction is performed on HiveServer2, which equates to
the Hive Virtual Warehouse construct in the UI. This means that compaction is essentially
query execution. Compaction runs an INSERT statement created from the output of a SELECT
statement and runs in the Hive Virtual Warehouse assigned as the compactor, thereby re-writing
the data. The Hive Virtual Warehouse, configured as the compactor, delivers the query capacity
to perform this. Therefore, when you size the Hive Virtual Warehouse that performs compaction,
you must take into consideration the extra workload to run the compaction queries. That extra
workload needs to be considered in addition to your other query workloads on the Hive Virtual
Warehouse that is configured as the compactor.One of the Hive Virtual Warehouses must be configured as the compactor for the
associated Database Catalog (excluding the default Database Catalog whose commpaction is
performed on CDP Base). This Hive Virtual Warehouse compactor runs all of the compaction
queries for all Virtual Warehouses that use one particular Database Catalog, including Impala
Virtual Warehouses. However, Impala Virtual Warehouses cannot be configured as the compactor
Virtual Warehouse for a Database Catalog. Compaction tasks must be assigned to a Hive Virtual
Warehouse. The first Hive Virtual Warehouse you create against a Database Catalog is
automatically set as the compactor. If you decide you do not want that particular warehouse to
take on the compaction workload, you can set another Hive Virtual Warehouse to perform the
compaction workload by following these steps:
Log in to the CDP web interface and navigate to the Data Warehouse service.
On the Overview page, select the Hive Virtual Warehouse that you
want to set as the compactor, and click .