Configure Impala Virtual Warehouses on AWS environments to spill to S3

You can configure an Impala Virtual Warehouse on an AWS environment to write temporary data to S3. This capability requires an entitlement. You simply follow steps to specify an S3 URI when you are creating the Virtual Warehouse in Cloudera Data Warehouse (CDW).

After you have created the Virtual Warehouse configured to spill to a specific S3 location, you cannot change the S3 URI. The field becomes uneditable.
  • To use an external S3 bucket for spilled data, add an external S3 bucket to CDW with the correct read/write permissions; otherwise, skip configuration and just use the default S3 bucket created automatically for the environment.
  • Get the URI for the S3 bucket to use for spilled data. For example, S3://mybucket/scratch/path.
  1. Click Data Warehouse > Virtual Warehouses > Add New
  2. In New Virtual Warehouse, specify a Name, its Type (Impala), its Database Catalog, whether to Enable SSO, User Groups that can access endpoints, keys and values for Tagging the Virtual Warehouse, and the Size.
  3. In Spill to S3, specify the S3 URI for the spilled data location.
  4. Set auto-scaling properties.
  5. Click Create.