Hive VW isolation auto-scaling
The Hive Virtual Warehouse in Cloudera Data Warehouse (CDW) Public Cloud bases its
auto-scaling on the size of data scanned for a query. Queries that scan amounts of data
that exceed the hive.query.isolation.scan.size.threshold
setting use the query
isolation method of auto-scaling.
To run ETL-type queries, you turn on query isolation when you configure the Hive Virtual
Warehouse. Scaling occurs as follows:
- If the query does not exceed the hive.query.isolation.scan.size.threshold, concurrency scaling occurs.
- If the query does exceed the hive.query.isolation.scan.size.threshold, query isolation scaling occurs.
HiveServer generates a preliminary query execution plan.
The Hive Virtual Warehouse adds a new dedicated executor group with the right number of
executors to handle the query instead of using the number associated with the T-shirt size you
configured. This isolated executor group is used only under the following conditions:
- You run a single ETL-type query.
- The query exceeds the hive.query.isolation.scan.size.threshold.
Otherwise, the concurrency method of auto-scaling is used.
The following diagram shows the isolation method of auto-scaling:
The variable-sized, ephemeral executor groups for ETL terminate after the query completes.