Setting the scratch space limit for spilling Impala queries in AWS environments

Running out of space to store query data on the SSD causes query failure. To prevent these failures, you need to know how to configure scratch space limits for Impala Virtual Warehouse (CDW) service.

Configuring scratch space on EBS volumes incurs additional cost and requires an entitlement. The compute nodes that are used for coordinator and executor Kubernetes pods in Impala Virtual Warehouses are equipped as follows:
  • Attached solid state drive (SSD) storage, using NVMe (non-volatile memory express) protocol.
  • Instance storage 2x300 GB allocated as follows:
    • Data cache (200 GiB default value)
    • Scratch space (200 GiB default value)

      Scratch space requires a base 2 storage increment, such as GibiBytes.

If you have spilling queries that require more scratch space than your compute nodes have, you need to configure additional scratch space for Impala Virtual Warehouses on AWS Elastic Block Store (EBS). You configure scratch space limits when you create an Impala Virtual Warehouse. After the Virtual Warehouse is created, you cannot add scratch space limits by editing the configuration. Configuring scratch space on EBS volumes incurs additional cost. See Amazon EBS Pricing for details.

  1. Create a VW as described in Adding a new Virtual Warehouse, selecting the Impala engine type.
  2. Select a database catalog, or accept the default.
  3. Set User Groups that can access endpoints, keys and values for Tagging the Virtual Warehouse.
  4. Select a size for the Virtual Warehouse.
  5. In Scratch Space, choose to only use the 300 GiB of instance storage for scratch space, or choose additional EBS volumes for spilling queries.
  6. Click CREATE.