Configuring Client Access to Impala
Application developers have a number of options to interface with Impala.
The core development language with Impala is SQL. You can also use Java or other languages to interact with Impala through the standard JDBC and ODBC interfaces used by many business intelligence tools. For specialized kinds of analysis, you can supplement the Impala built-in functions by writing user-defined functions in C++ or Java.
You can connect and submit requests to the Impala through:
- The impala-shell interactive command interpreter
- The Hue web-based user interface
- JDBC
- ODBC
- Impyla
Impala clients can connect to any Coordinator Impala Daemon
(impalad
) via HiveServer2 over HTTP or over the TCP
binary or via Beeswax. All interfaces support Kerberos and LDAP for
authentication to Impala. See below for the default ports and the Impala
configuration field names to change the ports in Cloudera Manager.
Protocol | Default Port | Cloudera Manager Field to Specify an Alternate Port |
---|---|---|
HiveServer2 HTTP | 28000 | Impala Daemon HiveServer2 HTTP Port |
HiveServer2 binary TCP | 21050 | Impala Daemon HiveServer2 Port |
Beeswax | 21000 | Impala Daemon Beeswax Port |
Impala Startup Options for Client Connections
Use the following flags to control client connections to Impala when starting Impala Daemon coordinator. If a configuration field exists in Cloudera Manager, the field name is shown in parenthesis next to the flag name.
If the Cloudera Manager interface does not yet have a form field for an option, the Advanced category page for each daemon includes one or more Safety Valve fields where you can enter option names directly.
- --accepted_client_cnxn_timeout
- Controls how Impala treats new connection requests if it has run
out of the number of threads configured by
--fe_service_threads
.If
--accepted_client_cnxn_timeout > 0
, new connection requests are rejected if Impala can't get a server thread within the specified (in seconds) timeout.If
--accepted_client_cnxn_timeout=0
, i.e. no timeout, clients wait indefinitely to open the new session until more threads are available.The default timeout is 5 minutes.
The timeout applies only to client facing thrift servers, i.e., HS2 and Beeswax servers.
- --disconnected_session_timeout
- When a HiveServer2 session has had no open connections for longer
than this value, the session will be closed, and any associated
queries will be unregistered.
Specify the value in hours.
The default value is 1 hour.
This flag does not apply to Beeswax clients. When a Beeswax client connection is closed, Impala closes the session associated with that connection.
- --fe_service_threads (Impala Daemon Max Client)
- Specifies the maximum number of concurrent client connections
allowed. The default value is 64 with which 64 queries can run
simultaneously.
If you have more clients trying to connect to Impala than the value of this setting, the later arriving clients have to wait for the duration specified by
--accepted_client_cnxn_timeout
. You can increase this value to allow more client connections. However, a large value means more threads to be maintained even if most of the connections are idle, and it could negatively impact query latency. Client applications should use the connection pool to avoid need for large number of sessions. - --idle_client_poll_time_s
- The value of this setting specifies how frequently Impala polls
to check if a client connection is idle and closes it if the
connection is idle. A client connection is idle if all sessions
associated with the client connection are idle.
By default,
--idle_client_poll_time_s
is set to 30 seconds.If
--idle_client_poll_time_s
is set to 0, idle client connections stay open until explicitly closed by the clients.The connection will only be closed if all the associated sessions are idle or closed. Sessions cannot be idle unless either the flag
--idle_session_timeout
or theIDLE_SESSION_TIMEOUT
query option is set to greater than 0. If idle session timeout is not configured, a session cannot become idle by definition, and therefore its connection stays open until the client explicitly closes it. - --max_cookie_lifetime_s
- Impala uses cookies for authentication
when clients connect via HiveServer2 over HTTP. Use the
--max_cookie_lifetime_s
startup flag to control how long generated cookies are valid for.Specify the value in seconds.
The default value is 1 day.
Setting the flag to
0
disables cookie support.When an unexpired cookie is successfully verified, the user name contained in the cookie is set on the connection.
Each
impalad
uses its own key to generate the signature, so clients that reconnect to a differentimpalad
have to re-authenticate.On a single
impalad
, cookies are valid across sessions and connections. - --beeswax_port (Impala Daemon Beeswax Port)
- Specifies the port for clients to connect to Impala daemon via the
Beeswax protocol.
You can disable the Beeswax end point for clients by setting the flag to 0.
- --hs2_http_port (Impala Daemon HiveServer2 HTTP Port)
- Specifies the port for clients to connect to Impala daemon over
HTTP.
You can disable the HTTP end point for clients by setting the flag to
0
.To enable TLS/SSL for HiveServer2 HTTP endpoint, use
--ssl_server_certificate
and--ssl_private_key
. - --hs2_port (Impala Daemon HiveServer2 Port)
- Specifies the port for clients to connect to Impala daemon via the
HiveServer2 protocol.
You can disable the binary HiveServer2 end point for clients by setting the flag to 0.
- --ping_expose_webserver_url
- Controls whether PingImpalaService, PingImpalaHS2Service RPC calls should expose the
debug web url to the client or not.
By default this flag is set to
true
so that the users will not see an empty string when the debug web UI is not available. If the flag is set to false, the RPC calls will return an empty string instead of the real url signalling that the debug web UI is not available.