Modifying Impala Startup Options
The configuration options for the Impala-related daemons let you choose which hosts and ports to use for the services that run on a single host, specify directories for logging, control resource usage and security, and specify other aspects of the Impala software.
Configuring Impala Startup Options through Cloudera Manager
If you manage your cluster through Cloudera Manager, configure the settings for all the Impala-related daemons by navigating to this page: instructions about how to configure Impala through Cloudera Manager.
. See the Cloudera Manager documentation forIf the Cloudera Manager interface does not yet have a form field for a newly added option, or if you need to use special options for debugging and troubleshooting, the Advanced option page for each daemon includes one or more fields where you can enter option names directly. In Cloudera Manager 4, these fields are labelled Safety Valve; in Cloudera Manager 5, they are called Advanced Configuration Snippet. There is also a free-form field for query options, on the top-level Impala Daemon options page.
Configuring Impala Startup Options through the Command Line
When you run Impala in a non-Cloudera Manager environment, the Impala server, statestore, and catalog services start up using values provided in a defaults file, /etc/default/impala.
This file includes information about many resources used by Impala. Most of the defaults included in this file should be effective in most cases. For example, typically you would not change the definition of the CLASSPATH variable, but you would always set the address used by the statestore server. Some of the content you might modify includes:
IMPALA_STATE_STORE_HOST=127.0.0.1 IMPALA_STATE_STORE_PORT=24000 IMPALA_BACKEND_PORT=22000 IMPALA_LOG_DIR=/var/log/impala IMPALA_CATALOG_SERVICE_HOST=... IMPALA_STATE_STORE_HOST=... export IMPALA_STATE_STORE_ARGS=${IMPALA_STATE_STORE_ARGS:- \ -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}} IMPALA_SERVER_ARGS=" \ -log_dir=${IMPALA_LOG_DIR} \ -catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \ -state_store_port=${IMPALA_STATE_STORE_PORT} \ -use_statestore \ -state_store_host=${IMPALA_STATE_STORE_HOST} \ -be_port=${IMPALA_BACKEND_PORT}" export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}
To use alternate values, edit the defaults file, then restart all the Impala-related services so that the changes take effect. Restart the Impala server using the following commands:
$ sudo service impala-server restart Stopping Impala Server: [ OK ] Starting Impala Server: [ OK ]
Restart the Impala statestore using the following commands:
$ sudo service impala-state-store restart Stopping Impala State Store Server: [ OK ] Starting Impala State Store Server: [ OK ]
Restart the Impala catalog service using the following commands:
$ sudo service impala-catalog restart Stopping Impala Catalog Server: [ OK ] Starting Impala Catalog Server: [ OK ]
Some common settings to change include:
-
Statestore address. Cloudera recommends the statestore be on a separate host not running the impalad daemon. In that recommended configuration, the impalad daemon cannot refer to the statestore server using the loopback address. If the statestore is hosted on a machine with an IP address of 192.168.0.27, change:
IMPALA_STATE_STORE_HOST=127.0.0.1
to:
IMPALA_STATE_STORE_HOST=192.168.0.27
-
Catalog server address (including both the hostname and the port number). Update the value of the IMPALA_CATALOG_SERVICE_HOST variable. Cloudera recommends the catalog server be on the same host as the statestore. In that recommended configuration, the impalad daemon cannot refer to the catalog server using the loopback address. If the catalog service is hosted on a machine with an IP address of 192.168.0.27, add the following line:
IMPALA_CATALOG_SERVICE_HOST=192.168.0.27:26000
The /etc/default/impala defaults file currently does not define an IMPALA_CATALOG_ARGS environment variable, but if you add one it will be recognized by the service startup/shutdown script. Add a definition for this variable to /etc/default/impala and add the option -catalog_service_host=hostname. If the port is different than the default 26000, also add the option -catalog_service_port=port.
-
Memory limits. You can limit the amount of memory available to Impala. For example, to allow Impala to use no more than 70% of system memory, change:
export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \ -log_dir=${IMPALA_LOG_DIR} \ -state_store_port=${IMPALA_STATE_STORE_PORT} \ -use_statestore -state_store_host=${IMPALA_STATE_STORE_HOST} \ -be_port=${IMPALA_BACKEND_PORT}}
to:
export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \ -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT} \ -use_statestore -state_store_host=${IMPALA_STATE_STORE_HOST} \ -be_port=${IMPALA_BACKEND_PORT} -mem_limit=70%}
You can specify the memory limit using absolute notation such as 500m or 2G, or as a percentage of physical memory such as 60%.
-
Core dump enablement. To enable core dumps, change:
export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}
to:
export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-true}
-
Authorization using the open source Sentry plugin. Specify the -server_name and -authorization_policy_file options as part of the IMPALA_SERVER_ARGS and IMPALA_STATE_STORE_ARGS settings to enable the core Impala support for authentication. See Starting the impalad Daemon with Sentry Authorization Enabled for details.
-
Auditing for successful or blocked Impala queries, another aspect of security. Specify the -audit_event_log_dir=directory_path option and optionally the -max_audit_event_log_file_size=number_of_queries and -abort_on_failed_audit_event options as part of the IMPALA_SERVER_ARGS settings, for each Impala node, to enable and customize auditing. See Auditing Impala Operations for details.
-
Password protection for the Impala web UI, which listens on port 25000 by default. This feature involves adding some or all of the --webserver_password_file, --webserver_authentication_domain, and --webserver_certificate_file options to the IMPALA_SERVER_ARGS and IMPALA_STATE_STORE_ARGS settings. See Security Guidelines for Impala for details.
-
Another setting you might add to IMPALA_SERVER_ARGS is a comma-separated list of query options and values:
-default_query_options='option=value,option=value,...'
These options control the behavior of queries performed by this impalad instance. The option values you specify here override the default values for Impala query options, as shown by the SET statement in impala-shell. -
Options for resource management, in conjunction with the YARN and Llama components. These options include -enable_rm, -llama_host, -llama_port, -llama_callback_port, and -cgroup_hierarchy_path. Additional options to help fine-tune the resource estimates are -—rm_always_use_defaults, -—rm_default_memory=size, and -—rm_default_cpu_cores. For details about these options, see impalad Startup Options for Resource Management. See Integrated Resource Management with YARN for information about resource management in general, and The Llama Daemon for information about the Llama daemon.
-
During troubleshooting, Cloudera Support might direct you to change other values, particularly for IMPALA_SERVER_ARGS, to work around issues or gather debugging information.
- -enable_rm: Whether to enable resource management or not, either true or false. The default is false. None of the other resource management options have any effect unless -enable_rm is turned on.
- -llama_host: Hostname or IP address of the Llama service that Impala should connect to. The default is 127.0.0.1.
- -llama_port: Port of the Llama service that Impala should connect to. The default is 15000.
- -llama_callback_port: Port that Impala should start its Llama callback service on. Llama reports when resources are granted or preempted through that service.
- -cgroup_hierarchy_path: Path where YARN and Llama will create cgroups for granted resources. Impala assumes that the cgroup for an allocated container is created in the path 'cgroup_hierarchy_path + container_id'.
- -rm_always_use_defaults: If this Boolean option is enabled, Impala ignores computed estimates and always obtains the default memory and CPU allocation from Llama at the start of the query. These default estimates are approximately 2 CPUs and 4 GB of memory, possibly varying slightly depending on cluster size, workload, and so on. Cloudera recommends enabling -rm_always_use_defaults whenever resource management is used, and relying on these default values (that is, leaving out the two following options).
- -rm_default_memory=size: Optionally sets the default estimate for memory usage for each query. You can use suffixes such as M and G for megabytes and gigabytes, the same as with the MEM_LIMIT query option. Only has an effect when -rm_always_use_defaults is also enabled.
- -rm_default_cpu_cores: Optionally sets the default estimate for number of virtual CPU cores for each query. Only has an effect when -rm_always_use_defaults is also enabled.
Checking the Values of Impala Configuration Options
You can check the current runtime value of all these settings through the Impala web interface, available by default at http://impala_hostname:25000/varz for the impalad daemon, http://impala_hostname:25010/varz for the statestored daemon, or http://impala_hostname:25020/varz for the catalogd daemon. In the Cloudera Manager interface, you can see the link to the appropriate service_name Web UI page when you look at the status page for a specific daemon on a specific host.
Startup Options for impalad Daemon
The impalad daemon implements the main Impala service, which performs query processing and reads and writes the data files.
Startup Options for statestored Daemon
The statestored daemon implements the Impala statestore service, which monitors the availability of Impala services across the cluster, and handles situations such as nodes becoming unavailable or becoming available again.
Startup Options for catalogd Daemon
The catalogd daemon implements the Impala catalog service, which broadcasts metadata changes to all the Impala nodes when Impala creates a table, inserts data, or performs other kinds of DDL and DML operations.
By default, the metadata loading and caching on startup happens asynchronously, so Impala can begin accepting requests promptly. To enable the original behavior, where Impala waited until all metadata was loaded before accepting any requests, set the catalogd configuration option --load_catalog_in_background=false.