Hive VW isolation auto-scaling

The Hive Virtual Warehouse in Cloudera Data Warehouse (CDW) Public Cloud bases its auto-scaling on the size of data scanned for a query. Queries that scan amounts of data that exceed the hive.query.isolation.scan.size.threshold setting use the query isolation method of auto-scaling.

To run ETL-type queries, you turn on query isolation when you configure the Hive Virtual Warehouse. Scaling occurs as follows:
  • If the query does not exceed the hive.query.isolation.scan.size.threshold, concurrency scaling occurs.
  • If the query does exceed the hive.query.isolation.scan.size.threshold, query isolation scaling occurs.

HiveServer generates a preliminary query execution plan.

The Hive Virtual Warehouse adds a new dedicated executor group with the right number of executors to handle the query instead of using the number associated with the T-shirt size you configured. This isolated executor group is used only under the following conditions:
  • You run a single ETL-type query.
  • The query exceeds the hive.query.isolation.scan.size.threshold.

Otherwise, the concurrency method of auto-scaling is used.

The following diagram shows the isolation method of auto-scaling:

The variable-sized, ephemeral executor groups for ETL terminate after the query completes.