When you create a new Impala Virtual Warehouse in Cloudera Data Warehouse Private
Cloud, you can configure heavy Impala queries to write intermediate files during large
sorts, joins, aggregations, or analytic function operations to a remote scratch space on
HDFS.
Configure the Impala daemon to use the specified locations for writing the
intermediate files as described in Configuring Impala daemon to spill to
HDFS .
note
You must create a new Impala Virtual Warehouse to enable the
option to spill intermediate Impala query execution data to HDFS.
Log in to the Data Warehouse service as a DWAdmin.
Click under Virtual Warehouses on
the Overview page to create a new Virtual Warehouse.
Specify a name for the Virtual Warehouse, select IMPALA
as the type, select a Database Catalog, and size from the drop-down menu.
Specify the HDFS URI in the Spill to HDFS field in the
following format:
hdfs://[***HOSTNAME***]:[***PORT***]/[***PATH***]:[***LIMIT***]
Hostname and port are mandatory arguments that you must specify in the HDFS
URI.
note
When a valid HDFS URI is passed by the client, the
300G of local storage is used as a local disk buffer for spilling to
HDFS.
Select scaling and resource allocation and click
Create .