impala-shell Configuration Options
You can specify the following options when starting the impala-shell command to change how shell commands are executed. The table shows the format to use when specifying each option on the command line, or through the $HOME/.impalarc configuration file.
Continue reading:
Summary of impala-shell Configuration Options
The following table shows the names and allowed arguments for the impala-shell configuration options. You can specify options on the command line, or in a configuration file as described in impala-shell Configuration File.
Command-Line Option | Configuration File Setting | Explanation |
---|---|---|
-B or --delimited |
write_delimited=true |
Causes all query results to be printed in plain format as a delimited text file. Useful for producing data files to be used with other Hadoop components. Also useful for avoiding the performance overhead of pretty-printing all output, especially when running benchmark tests using queries returning large result sets. Specify the delimiter character with the --output_delimiter option. Store all query results in a file rather than printing to the screen with the -B option. Added in Impala 1.0.1. |
--print_header |
print_header=true |
|
-o filename or --output_file filename |
output_file=filename |
Stores all query results in the specified file. Typically used to store the results of a single query issued from the command line with the -q option. Also works for interactive sessions; you see the messages such as number of rows fetched, but not the actual result set. To suppress these incidental messages when combining the -q and -o options, redirect stderr to /dev/null. Added in Impala 1.0.1. |
--output_delimiter=character |
output_delimiter=character |
Specifies the character to use as a delimiter between fields when query results are printed in plain format by the -B option. Defaults to tab ('\t'). If an output value contains the delimiter character, that field is quoted, escaped by doubling quotation marks, or both. Added in Impala 1.0.1. |
-p or --show_profiles |
show_profiles=true |
Displays the query execution plan (same output as the EXPLAIN statement) and a more detailed low-level breakdown of execution steps, for every query executed by the shell. |
-h or --help |
N/A |
Displays help information. |
-i hostname or --impalad=hostname[:portnum] |
impalad=hostname[:portnum] |
Connects to the impalad daemon on the specified host. The default port of 21000 is assumed unless you provide another value. You can connect to any host in your cluster that is running impalad. If you connect to an instance of impalad that was started with an alternate port specified by the --fe_port flag, provide that alternative port. |
-q query or --query=query |
query=query |
Passes a query or other impala-shell command from the command line. The impala-shell interpreter immediately exits after processing the statement. It is limited to a single statement, which could be a SELECT, CREATE TABLE, SHOW TABLES, or any other statement recognized in impala-shell. Because you cannot pass a USE statement and another query, fully qualify the names for any tables outside the default database. (Or use the -f option to pass a file with a USE statement followed by other queries.) |
-f query_file or --query_file=query_file |
query_file=path_to_query_file |
Passes a SQL query from a file. Multiple statements must be semicolon (;) delimited. In CDH 5.5 / Impala 2.3 and higher, you can specify a filename of - to represent standard input. This feature makes it convenient to use impala-shell as part of a Unix pipeline where SQL statements are generated dynamically by other tools. |
-k or --kerberos |
use_kerberos=true |
Kerberos authentication is used when the shell connects to impalad. If Kerberos is not enabled on the instance of impalad to which you are connecting, errors are displayed. See for the steps to set up and use Kerberos authentication in Impala. |
-s kerberos_service_name or --kerberos_service_name=name |
kerberos_service_name=name |
Instructs impala-shell to authenticate to a particular impalad service principal. If a kerberos_service_name is not specified, impala is used by default. If this option is used in conjunction with a connection in which Kerberos is not supported, errors are returned. |
-V or --verbose |
verbose=true |
Enables verbose output. |
--quiet |
verbose=false |
Disables verbose output. |
-v or --version |
version=true |
Displays version information. |
-c |
ignore_query_failure=true |
Continues on query failure. |
-r or --refresh_after_connect |
refresh_after_connect=true |
Updates Impala metadata upon connection. Same as running the INVALIDATE METADATA statement after connecting. (This option was originally named when the REFRESH statement did the extensive metadata updates now performed by INVALIDATE METADATA.) |
-d default_db or --database=default_db |
default_db=default_db |
Specifies the database to be used on startup. Same as running the USE statement after connecting. If not specified, a database named DEFAULT is used. |
--ssl | ssl=true | Enables TLS/SSL for impala-shell. |
--ca_cert=path_to_certificate | ca_cert=path_to_certificate | The local pathname pointing to the third-party CA certificate, or to a copy of the server certificate for self-signed server certificates. If --ca_cert is not set, impala-shell enables TLS/SSL, but does not validate the server certificate. This is useful for connecting to a known-good Impala that is only running over TLS/SSL, when a copy of the certificate is not available (such as when debugging customer installations). |
-l | use_ldap=true | Enables LDAP authentication. |
-u | user=user_name | Supplies the user name, when LDAP authentication is enabled by the -l option. (Specify the short user name, not the full LDAP distinguished name.) The shell then prompts interactively for the password. |
--ldap_password_cmd=command | N/A | Specifies a command to run to retrieve the LDAP password, when LDAP authentication is enabled by the -l option. If the command includes space-separated arguments, enclose the command and its arguments in quotation marks. |
--config_file=path_to_config_file | N/A | Specifies the path of the file containing impala-shell configuration settings. The default is $HOME/.impalarc. This setting can only be specified on the command line. |
--live_progress | N/A | Prints a progress bar showing roughly the percentage complete for each query. The information is updated interactively as the query progresses. See LIVE_PROGRESS Query Option (CDH 5.5 or higher only). |
--live_summary | N/A | Prints a detailed report, similar to the SUMMARY command, showing progress details for each phase of query execution. The information is updated interactively as the query progresses. See LIVE_SUMMARY Query Option (CDH 5.5 or higher only). |
--var=variable_name=value | N/A | Defines a substitution variable that can be used within the impala-shell session. The variable can be substituted into statements processed by the -q or -f options, or in an interactive shell session. Within a SQL statement, you substitute the value by using the notation ${var:variable_name}. This feature is available in CDH 5.7 / Impala 2.5 and higher. |
impala-shell Configuration File
You can define a set of default options for your impala-shell environment, stored in the file $HOME/.impalarc. This file consists of key-value pairs, one option per line. Everything after a # character on a line is treated as a comment and ignored.
The configuration file must contain a header label [impala], followed by the options specific to impala-shell. (This standard convention for configuration files lets you use a single file to hold configuration options for multiple applications.)
To specify a different filename or path for the configuration file, specify the argument --config_file=path_to_config_file on the impala-shell command line.
The names of the options in the configuration file are similar (although not necessarily identical) to the long-form command-line arguments to the impala-shell command. For the names to use, see Summary of impala-shell Configuration Options.
Any options you specify on the impala-shell command line override any corresponding options within the configuration file.
The following example shows a configuration file that you might use during benchmarking tests. It sets verbose mode, so that the output from each SQL query is followed by timing information. impala-shell starts inside the database containing the tables with the benchmark data, avoiding the need to issue a USE statement or use fully qualified table names.
In this example, the query output is formatted as delimited text rather than enclosed in ASCII art boxes, and is stored in a file rather than printed to the screen. Those options are appropriate for benchmark situations, so that the overhead of impala-shell formatting and printing the result set does not factor into the timing measurements. It also enables the show_profiles option. That option prints detailed performance information after each query, which might be valuable in understanding the performance of benchmark queries.
[impala] verbose=true default_db=tpc_benchmarking write_delimited=true output_delimiter=, output_file=/home/tester1/benchmark_results.csv show_profiles=true
The following example shows a configuration file that connects to a specific remote Impala node, runs a single query within a particular database, then exits. You would typically use this kind of single-purpose configuration setting with the impala-shell command-line option --config_file=path_to_config_file, to easily select between many predefined queries that could be run against different databases, hosts, or even different clusters. To run a sequence of statements instead of a single query, specify the configuration option query_file=path_to_query_file instead.
[impala] impalad=impala-test-node1.example.com default_db=site_stats # Issue a predefined query and immediately exit. query=select count(*) from web_traffic where event_date = trunc(now(),'dd')