Configuring Hive query isolation auto-scaling
You can configure your Hive Virtual Warehouse to add dedicated executors to run scan-heavy, data-intensive queries, also known as ETL queries. You learn how to enable a Virtual Warehouse auto-scaling feature and set a query isolation parameter in the Hive configuration.
- You are familiar with the auto-scaling process.
- You are creating a Hive Virtual Warehouse for running ETL-type queries.
- In Cloudera Data Warehouse, you added a Hive Virtual Warehouse, configured the size of the Hive Virtual Warehouse, and configured auto-suspend as described in previous topics.
- You obtained the DWAdmin role.
In Concurrency Auto-scaling properties, in Executors: Min: ,
Max:, limit auto-scaling by setting the minimum and maximum number of
executor nodes that can be added.
Choose when your cluster auto-scales up based on one of the following choices:
- WAIT TIME
In Concurrency Autoscaling, enable Query
For example:Two configuration options appear: Max Concurrent Isolated Queries and Max Nodes Per Isolated Query.
In Max Concurrent Isolated Queries, set the maximum number of isolated queries
that can run concurrently in their own dedicated executor nodes.
Select this number based on the scan size of the data for your average scan-heavy, data-intensive query.
- In Max Nodes Per Isolated Query, set how many executor nodes can be spawned for each isolated query.
- Click CREATE.
- In the Data Warehouse service UI, click Virtual Warehouses, and find your Virtual Warehouse. Click Options .
In Configuration files, select hive-site and set
hive.query.isolation.scan.size.thresholdparameter to limit how much data can be scanned.For example, set 400GB.
- Click APPLY.