You can configure your Hive Virtual Warehouse to add dedicated executors to run
scan-heavy, data-intensive queries, also known as ETL queries. You learn how to enable a
Virtual Warehouse auto-scaling feature and set a query isolation parameter in the Hive
configuration.
You are creating a Hive Virtual Warehouse for running ETL-type queries.
In Cloudera Data Warehouse, you added a Hive Virtual Warehouse, configured the size of
the Hive Virtual Warehouse, and configured auto-suspend as described in previous topics.
You obtained the DWAdmin role.
In this task, first you configure the same auto-scaling properties as described in
the previous topic Configuring Hive VW concurrency auto-scaling,
and then you enable query isolation. Next, you set the
hive.query.isolation.scan.size.threshold configuration
parameter.
In Concurrency Autoscaling properties, limit
auto-scaling by setting the minimum and maximum number of executor nodes that
can be added.
Choose when your cluster auto-scales up based on one of the following choices:
HEADROOM
WAIT TIME
In Concurrency Autoscaling, select Enable
Query Isolation.
Two configuration options appear: Max Concurrent Isolated
Queries and Max Nodes Per Isolated
Query.
In Max Concurrent Isolated Queries, set the maximum
number of isolated queries that can run concurrently in their own dedicated
executor nodes.
Select this number based on the scan size of the data for your average
scan-heavy, data-intensive query.
In Max Nodes Per Isolated Query, set how many executor
nodes can be spawned for each isolated query.
Click Create Virtual Warehouse.
In the Data Warehouse service Overview page, click
Virtual Warehouses, and find your Virtual Warehouse.
Click > Edit > Configurations > Hiveserver2.
Select hive-site from the Configuration
files drop-down menu and set
hive.query.isolation.scan.size.threshold parameter to limit
how much data can be scanned.