Configuring Impala to Work with JDBC

Impala supports the standard JDBC interface, allowing access from commercial Business Intelligence tools and custom software written in Java or other programming languages. The JDBC driver allows you to access Impala from a Java program that you write, or a Business Intelligence or similar tool that uses JDBC to communicate with various database products.

Setting up a JDBC connection to Impala involves the following steps:

Verifying the communication port where the Impala daemons in your cluster are listening for incoming JDBC requests.
Installing the JDBC driver on every system that runs the JDBC-enabled application.
Specifying a connection string for the JDBC application to access one of the servers running the impalad daemon, with the appropriate security settings.

Continue reading:

Configuring the JDBC Port
Choosing the JDBC Driver
Enabling Impala JDBC Support on Client Systems
Establishing JDBC Connections

Configuring the JDBC Port

The default port used by JDBC 2.0 and later (as well as ODBC 2.x) is 21050. Impala server accepts JDBC connections through this same port 21050 by default. Make sure this port is available for communication with other hosts on your network, for example, that it is not blocked by firewall software. If your JDBC client software connects to a different port, specify that alternative port number with the --hs2_port option when starting impalad. See Starting Impala for details about Impala startup options. See Ports Used by Impala for information about all ports used for communication between Impala and clients or between Impala components.

Choosing the JDBC Driver

In Impala 2.0 and later, you have the choice between the Cloudera JDBC Connector and the Hive 0.13 JDBC driver. Cloudera recommends using the Cloudera JDBC Connector where practical.

If you are already using JDBC applications with an earlier Impala release, you must update your JDBC driver to one of these choices, because the Hive 0.12 driver that was formerly the only choice is not compatible with Impala 2.0 and later.

Both the Cloudera JDBC 2.5 Connector and the Hive JDBC driver provide a substantial speed increase for JDBC applications with Impala 2.0 and higher, for queries that return large result sets.

Enabling Impala JDBC Support on Client Systems

Using the Cloudera JDBC Connector (recommended)

You download and install the Cloudera JDBC 2.5 connector on any Linux, Windows, or Mac system where you intend to run JDBC-enabled applications. From the Cloudera Connectors download page, you choose the appropriate protocol (JDBC or ODBC) and target product (Impala or Hive). The ease of downloading and installing on non-CDH systems makes this connector a convenient choice for organizations with heterogeneous environments.

Using the Hive JDBC Driver

You install the Hive JDBC driver (hive-jdbc package) through the Linux package manager, on hosts within the CDH cluster. The driver consists of several Java JAR files. The same driver can be used by Impala and Hive.

To get the JAR files, install the Hive JDBC driver on each CDH-enabled host in the cluster that will run JDBC applications. Follow the instructions for CDH 5.

If you are using JDBC-enabled applications on hosts outside the CDH cluster, you cannot use the CDH install procedure on the non-CDH hosts. Install the JDBC driver on at least one CDH host using the preceding procedure. Then download the JAR files to each client machine that will use JDBC with Impala:

commons-logging-X.X.X.jar
  hadoop-common.jar
  hive-common-X.XX.X-cdhX.X.X.jar
  hive-jdbc-X.XX.X-cdhX.X.X.jar
  hive-metastore-X.XX.X-cdhX.X.X.jar
  hive-service-X.XX.X-cdhX.X.X.jar
  httpclient-X.X.X.jar
  httpcore-X.X.X.jar
  libfb303-X.X.X.jar
  libthrift-X.X.X.jar
  log4j-X.X.XX.jar
  slf4j-api-X.X.X.jar
  slf4j-logXjXX-X.X.X.jar

To enable JDBC support for Impala on the system where you run the JDBC application:

Download the JAR files listed above to each client machine.
Note: For Maven users, see this sample github page for an example of the dependencies you could add to a pom file instead of downloading the individual JARs.
Store the JAR files in a location of your choosing, ideally a directory already referenced in your CLASSPATH setting. For example:
- On Linux, you might use a location such as /opt/jars/.
- On Windows, you might use a subdirectory underneath C:\Program Files.
To successfully load the Impala JDBC driver, client programs must be able to locate the associated JAR files. This often means setting the CLASSPATH for the client process to include the JARs. Consult the documentation for your JDBC client for more details on how to install new JDBC drivers, but some examples of how to set CLASSPATH variables include:
- On Linux, if you extracted the JARs to /opt/jars/, you might issue the following command to prepend the JAR files path to an existing classpath:
```
export CLASSPATH=/opt/jars/*.jar:$CLASSPATH
```
- On Windows, use the System Properties control panel item to modify the Environment Variables for your system. Modify the environment variables to include the path to which you extracted the files.
  Note: If the existing CLASSPATH on your client machine refers to some older version of the Hive JARs, ensure that the new JARs are the first ones listed. Either put the new JAR files earlier in the listings, or delete the other references to Hive JAR files.

Establishing JDBC Connections

The JDBC driver class depends on which driver you select.

Using the Cloudera JDBC Connector (recommended)

Depending on the level of the JDBC API your application is targeting, you can use the following fully-qualified class names (FQCNs):

com.cloudera.impala.jdbc41.Driver
com.cloudera.impala.jdbc41.DataSource

com.cloudera.impala.jdbc4.Driver
com.cloudera.impala.jdbc4.DataSource

com.cloudera.impala.jdbc3.Driver
com.cloudera.impala.jdbc3.DataSource

The connection string has the following format:

jdbc:impala://Host:Port[/Schema];Property1=Value;Property2=Value;...

The port value is typically 21050 for Impala.

For full details about the classes and the connection string (especially the property values available for the connection string), download the appropriate driver documentation for your platform from the Impala JDBC Connector download page.

Using the Hive JDBC Driver

For example, with the Hive JDBC driver, the class name is org.apache.hive.jdbc.HiveDriver. Once you have configured Impala to work with JDBC, you can establish connections between the two. To do so for a cluster that does not use Kerberos authentication, use a connection string of the form jdbc:hive2://host:port/;auth=noSasl. For example, you might use:

jdbc:hive2://myhost.example.com:21050/;auth=noSasl

To connect to an instance of Impala that requires Kerberos authentication, use a connection string of the form jdbc:hive2://host:port/;principal=principal_name. The principal must be the same user principal you used when starting Impala. For example, you might use:

jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM

To connect to an instance of Impala that requires LDAP authentication, use a connection string of the form jdbc:hive2://host:port/db_name;user=ldap_userid;password=ldap_password. For example, you might use:

jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123

Configuring Impala to Work with ODBC

Managing Isilon