Configuring an existing Impala Virtual Warehouse to spill to S3

A new Impala Virtual Warehouse requires no configuration to spill to S3. However, if you have an existing Impala Virtual Warehouse that you did not configure to spill to S3 when you created the Virtual Warehouse, configuration is required.

The Impala Virtual Warehouse on an AWS environment writes temporary data to S3 when you configure a spill to S3. This capability in an existing, but not a new, Virtual Warehouse, requires an entitlement.
If you have an existing Impala Virtual Warehouse, you need to take the following actions:
  • Edit your existing Virtual Warehouse to specify an S3 URI to spill to S3.
    • After editing, you cannot change the S3 URI.
    • After editing, you cannot select additional storage in Scratch Space Limit per node. The default 300 instance storage is used.
  • Ensure that Impala has read/write access to the configured scratch location on the Data Lake bucket using steps in Configuring a policy to spill Impala temporary data to S3.
Alternatively, instead of using the automatic default scratch location or to configuring the location, you can run the following CDP CLI command create-vw to configure a custom scratch location. Specify the spill location using the impala-options option using the spillToS3Uri field.

After you have created the Virtual Warehouse configured to spill to a specific S3 location, you cannot change the S3 URI. The field becomes uneditable.

  • To use an external S3 bucket for spilled data, add an external S3 bucket to CDW.
  • Note the URI of the external S3 bucket you added. For example, S3://mybucket/scratch/path.
  1. From the Management Console or CDP landing page, navigate to Data Warehouses.
  2. Click Virtual Warehouses.
  3. Click Options , and select Edit.
  4. Set the spill to S3 location.
  5. Click Save.
  6. Configure read/write access to the configured scratch location on the Data Lake bucket using steps in Identifying the spill location for Impala temporary data.