You can configure your Hive Virtual Warehouse to add dedicated executors to
run scan-heavy, data-intensive queries, also known as ETL queries. You learn how to enable a
Virtual Warehouse auto-scaling feature and set a query isolation parameter in the Hive
configuration.
You are creating a Hive Virtual Warehouse for running ETL-type queries.
In Cloudera Data Warehouse, you added a Hive Virtual Warehouse, configured the size of
the Hive Virtual Warehouse, and configured auto-suspend as described in previous topics.
You obtained the DWAdmin role.
In this task, first you configure the same auto-scaling properties as described in
the previous topic
"Configuring
Hive VW concurrency auto-scaling", and then you enable query isolation. Next, you set
the hive.query.isolation.scan.size.threshold configuration
parameter.
In Concurrency Auto-scaling properties, in Executors: Min: ,
Max:, limit auto-scaling by setting the minimum and maximum number of
executor nodes that can be added.
Choose when your cluster auto-scales up based on one of the following choices:
HEADROOM
WAIT TIME
In Concurrency Autoscaling, enable Query
Isolation.
For example:
Two configuration options appear.
In Max Concurrent Isolated Queries, set the maximum number of isolated queries
that can run concurrently in their own dedicated executor nodes.
Select this number based on the scan size of the data for your average
scan-heavy, data-intensive query.
In Max Nodes Per Isolated Query, set how many executor nodes can be spawned for
each isolated query.
Click CREATE.
In the Data Warehouse service UI, click
Virtual
Warehouses,
OptionsCONFIGURATIONS > Hiveserver2
.
In Configuration files, select hive-site and set
hive.query.isolation.scan.size.threshold parameter to limit
how much data can be scanned.