Setting scratch space limit for spilling Impala queries in Azure environments
Certain memory-intensive SQL operations write temporary data to disk (known as spilling to disk) when Impala is close to exceeding its memory limit on a particular host. This enables a query to complete successfully instead of failing with an out-of-memory error.
The compute nodes on Azure environments use the Standard_E16ds_v4 memory-optimized Virtual Machines by default that come with temporary SSD storage of 2 * 300 GiB (600 GiB). This space is utilized for both cached data and scratch space. The scratch space available on this temporary storage is insufficient for the spilling Impala queries.
Spilling queries require a scratch space in multiples of several 100 GiBs or TiBs, without which the queries may fail. To solve this problem, you can configure the scratch space between 300 GiB and 16 TiB per node while creating an Impala Virtual Warehouse in the Cloudera Data Warehouse (CDW) service. In addition to the local SSD, CDW allows you to use managed disks (Standard SSD and Premium disks) for extra space.
(SSD + managed disk)
|SSD only||300||968 cached / 384 uncached||0|
|SSD + 3 * P10||684||300 (3 *100)||0.073 (3 * 0.024)|
|SSD + 2 * P15||812||250 (2 * 125)||0.094 (2 * 0.047)|
|SSD + 3 * P15||1068||375 (3 * 125)||0.141 (3 * 0.047)|
|SSD + 2 * P20||1324||300 (2 * 150)||0.182 (2 * 0.091)|
|SSD + 3 * P20||1836||450 (3 * 150)||0.273 (3 * 0.091)|
|SSD + P40||2348||250||0.322|
|SSD + 3 * P30||3372||600 (3 * 200)||0.504 (3 * 0.168)|
|SSD + P50||4396||250||0.617|
|SSD + E60||8492||400||0.841|
|SSD + E70||16684||600||1.683|
You can select the scratch space limit only while creating a new Impala Virtual Warehouse. To set the scratch space limit for Impala queries:
- Log in to Cloudera Data Warehouse.
- In the Data Warehouse service, click Virtual Warehouses in the left navigation panel.
- On the Virtual Warehouses page, click Add New.
- Specify a Name.
- Select Virtual Warehouse type as IMPALA.
- Select the Database Catalog that it queries.
Select a Size.
Additional configuration options are displayed along with Scratch Space Limit per node (in GiBs).
- Select the scratch space from the Scratch Space Limit per node (in GiBs) dropdown.
- Click CREATE.