Setting scratch space limit for spilling Impala queries in Azure environments
Certain memory-intensive SQL operations write temporary data to disk (known as spilling to disk) when Impala is close to exceeding its memory limit on a particular host. Consequently, a query can complete successfully instead of failing with an out-of-memory error. You must set the scratch space limit carefully because you cannot change the limit later.
The compute nodes on Azure environments use the STANDARD_E16DS_v4 or STANDARD_E16DS_V5 memory-optimized Virtual Machines by default. These nodes have a temporary SSD storage of 2 * 300 GiB (600 GiB). This space is utilized for both cached data and scratch space. The scratch space available is insufficient for the spilling Impala queries.
Spilling queries to disk require a scratch space in multiples of several 100 GiBs or TiBs, without which the queries may fail. To prevent failure, you can configure the scratch space between 300 GiB and 16 TiB per executor node while creating an Impala Virtual Warehouse. In addition to the local SSD, you can use managed disks (Standard SSD and Premium disks) for extra space.
(SSD + managed disk)
|SSD only||300||968 cached / 384 uncached||0|
|SSD + 3 * P10||684||300 (3 *100)||0.073 (3 * 0.024)|
|SSD + 2 * P15||812||250 (2 * 125)||0.094 (2 * 0.047)|
|SSD + 3 * P15||1068||375 (3 * 125)||0.141 (3 * 0.047)|
|SSD + 2 * P20||1324||300 (2 * 150)||0.182 (2 * 0.091)|
|SSD + 3 * P20||1836||450 (3 * 150)||0.273 (3 * 0.091)|
|SSD + P40||2348||250||0.322|
|SSD + 3 * P30||3372||600 (3 * 200)||0.504 (3 * 0.168)|
|SSD + P50||4396||250||0.617|
|SSD + E60||8492||400||0.841|
|SSD + E70||16684||600||1.683|
You can select the scratch space limit only while creating a new Impala Virtual Warehouse.
- Obtain the CDW_IMPALA_EBS_SCRATCH_SPACE entitlement.
- Activate the environment in CDW using the E16ds_v4 or E16ds_v5.
- Disable Availability Zones for AKS during the environment activation; otherwise, you cannot set the scratch space limit.
- Log in to Cloudera Data Warehouse.
- In the Data Warehouse service, click Virtual Warehouses in the left navigation panel.
- On the Virtual Warehouses page, click Add New.
- Specify a Name.
- Select Virtual Warehouse type as IMPALA.
- Select the Database Catalog that it queries.
Select a Size.
Additional configuration options are displayed along with Scratch Space Limit per node (in GiBs).
- Select the scratch space from the Scratch Space Limit per node (in GiBs) dropdown.
- Click CREATE.