Configure Impala Daemon to spill to Ozone
You can use Ozone as a scratch space for writing intermediate files during large sorts, joins, aggregations, or analytic function operations.
Before you begin
- Identify the Ozone scratch directory where you want your new Impala to write the temporary data.
- Identify the IP address, host name, or service identifier of Ozone.
- Identify the port number of the Ozone Manager (if not-default).
Configuring the Start-up Option in Impala daemon
You can use the Impalad start option scratch_dirs
to specify the locations of the
intermediate files.
--scratch_dirs="ofs://authority/path(:max_bytes), local_buffer_dir (,local_dir…)"
- Where
ofs://authority/path
is the remote directory. authority
may includeip_address
orhostname
andport
, orservice_id
.max_bytes
is optional.
Using the above format:
- You can specify only one remote directory. When you configure a remote directory, you must specify a local buffer directory as the buffer. However you can use multiple local directories with the remote directory. If you specify multiple local directories, the first local directory would be used as the local buffer directory.
- If you configure both remote and local directories, the remote directory is only used when the local directories are fully utilized.
- The size of a remote intermediate file could affect the query performance, and the
value can be set by
--remote_tmp_file_size=size
in the start-up option. The default size of a remote intermediate file is 16MB while the maximum is 512MB.
Examples
- An Ozone scratch dir with one local buffer dir, file size 64MB. The space of Ozone
scratch dir is limited to 300G.
--scratch_dirs=ofs://10.0.0.49:29000/tmp:300G,/local_buffer_dir --remote_tmp_file_size=64M
- An Ozone scratch dir with one local buffer dir limited to 512MB, and one local dir
limited to 10GB. The space of Ozone scratch dir is limited to 300G. The Ozone Manager
uses its default port (9862).
--scratch_dirs=ofs://ozonemgr/tmp:300G,/local_buffer_dir:512M,/local_dir:10G
- An Ozone scratch dir with one local buffer dir, and multiple prioritized local dirs. The
space of Ozone scratch dir is unlimited. The Ozone service identifier is
ozone1
.--scratch_dirs=ofs://ozone1/tmp,/local_buffer_dir,/local_dir_1:5G:1,/local_dir_2:5G:2
Even though max_bytes is optional, it is highly recommended to configure for spilling to Ozone because the Ozone cluster space is limited.