Enabling Impala to spill to HDFS
When you create a new Impala Virtual Warehouse in Cloudera Data Warehouse Private Cloud, you can configure heavy Impala queries to write intermediate files during large sorts, joins, aggregations, or analytic function operations to a remote scratch space on HDFS.
Configure the Impala daemon to use the specified locations for writing the intermediate files as described in Configuring Impala daemon to spill to HDFS.
- Log in to the Data Warehouse service as a DWAdmin.
- Click under Virtual Warehouses on the Overview page to create a new Virtual Warehouse.
- Specify a name for the Virtual Warehouse, select IMPALA as the type, select a Database Catalog, and size from the drop-down menu.
Specify the HDFS URI in the Spill to HDFS field in the
hdfs://[***HOSTNAME***]:[***PORT***]/[***PATH***]:[***LIMIT***]Hostname and port are mandatory arguments that you must specify in the HDFS URI.
- Select scaling and resource allocation and click Create.