Enabling Impala to spill to HDFS

When you create a new Impala Virtual Warehouse in Cloudera Data Warehouse Private Cloud, you can configure heavy Impala queries to write intermediate files during large sorts, joins, aggregations, or analytic function operations to a remote scratch space on HDFS.

Configure the Impala daemon to use the specified locations for writing the intermediate files as described in Configuring Impala daemon to spill to HDFS.

  1. Log in to the Data Warehouse service as a DWAdmin.
  2. Click under Virtual Warehouses on the Overview page to create a new Virtual Warehouse.
  3. Specify a name for the Virtual Warehouse, select IMPALA as the type, select a Database Catalog, and size from the drop-down menu.
  4. Specify the HDFS URI in the Spill to HDFS field in the following format:
    hdfs://[***HOSTNAME***]:[***PORT***]/[***PATH***]:[***LIMIT***]
    Hostname and port are mandatory arguments that you must specify in the HDFS URI.
  5. Select scaling and resource allocation and click Create.