Configuring an existing Impala Virtual Warehouse to spill to S3
A new Impala Virtual Warehouse requires no configuration to spill to S3. However, if you have an existing Impala Virtual Warehouse that you did not configure to spill to S3 when you created the Virtual Warehouse, configuration is required.
If you have an existing Impala Virtual Warehouse,
you need to take the following actions:
Alternatively, instead of using the automatic default scratch location or
to configuring the location, you can run the following CDP CLI command
create-vw to configure a custom scratch location. Specify the
spill location using the - Edit your existing Virtual Warehouse to specify an S3 URI to spill to S3.
- After editing, you cannot change the S3 URI.
- After editing, you cannot select additional storage in Scratch Space Limit per node. The default 300 instance storage is used.
- Ensure that Impala has read/write access to the configured scratch location on the Data Lake bucket using steps in Configuring a policy to spill Impala temporary data to S3.
impala-options
option using the
spillToS3Uri
field.After you have created the Virtual Warehouse configured to spill to a specific S3 location, you cannot change the S3 URI. The field becomes uneditable.
- To use an external S3 bucket for spilled data, add an external S3 bucket to CDW.
- Note the URI of the external S3 bucket you added. For example, S3://mybucket/scratch/path.
- From the Management Console or CDP landing page, navigate to Data Warehouses.
- Go to the Virtual Warehouses tab.
- Click .
- Set the spill to S3 location.
- Click Save.
- Configure read/write access to the configured scratch location on the Data Lake bucket using steps in Identifying the spill location for Impala temporary data.