Spill Impala queries to external storage

Cloudera Data Warehouse (CDW) on Private Cloud enables you to write intermediate files during large sorts, joins, aggregations, or analytic function operations to a remote scratch space on HDFS or Ozone.

What is spill to external storage

Spill to external storage is an option in CDW where you can specify an HDFS or Ozone URI while creating an Impala Virtual Warehouse. This is where Impala writes intermediate files during large sorts, joins, aggregations, or analytic function operations.

How spilling to external storage works

To use this feature, you first configure the Impala daemon to use the specified locations for writing the intermediate files as described in Configuring Impala daemon to spill to HDFS. Then specify the HDFS or Ozone URI in the following format:
  • HDFS:
    hdfs://[***HOSTNAME***]:[***PORT***]/[***PATH***]:[***LIMIT***]
  • Ozone:
    ofs://[***SERVICE-NAME***]/[***PATH***]:[***LIMIT***]
Hostname and port are mandatory arguments that you must specify in the HDFS URI.