Configuring Hive VW concurrency auto-scaling

Configuring the Hive Virtual Warehouse to use concurrency auto-scaling is critical for controlling cloud expenses.

  • You are familiar with the auto-scaling process.
  • You are creating a Hive Virtual Warehouse for running BI-type queries.
  • In Cloudera Data Warehouse, you added a Hive Virtual Warehouse, configured the size of the Hive Virtual Warehouse, and configured auto-suspend as described in previous topics.
  • You obtained the DWAdmin role.
In this task, first you select the number of executors for your virtual cluster.
  • Minimum executors

    The fewest executors you think you will need to provide sufficient resources for your queries. The number of executors needed for a cloud workload is analogous to the number of nodes needed for an on-premises workload.

  • Maximum number

    The maximum number of executors you will need. This setting limits prevents running an infinite number of executors and runaway costs.

Consider the following factors when configuring the minimum and maximum number of executors:
  • Number of concurrent queries
  • Complexity of queries
  • Amount of data scanned by the queries
  • Number of queries
Next, you configure when your cluster should scale up and down based on one of the following factors:
  • HEADROOM: The number of available coordinators that trigger auto-scaling. For example, if Desired Free Capacity is set to 1 on an XSMALL-sized Virtual Warehouse, which has 2 executors, when there is less than one free coordinator (2 queries are concurrently executing), the warehouse auto-scales up and an additional executor group is added.
  • WAIT TIME: How long queries wait in the queue to execute. A query is queued if it arrives on HiveServer and no coordinator is available. For example, if WaitTime Seconds is set to 10, queries are waiting to execute in the queue for 10 seconds. The warehouse auto-scales up and adds an additional executor group.

Auto-scaling based on wait time is not as predictable as auto-scaling based on headroom. The scaling based on wait time might react to non-scalable factors to spin up clusters. For example, query wait times might increase due to inefficient queries, not query volume.

When headroom or wait time thresholds are exceeded, the Virtual Warehouse adds executor groups until the maximum setting for Executors: Min: , Max: has been reached.

  1. In Concurrency Auto-scaling properties, in Executors: Min: , Max:, limit auto-scaling by setting the minimum and maximum number of executor nodes that can be added.
  2. Choose when your cluster auto-scales up based on one of the following choices:
    • HEADROOM
    • WAIT TIME
  3. Click Apply.