Impala support for external JDBC tables (Preview)
Impala supports external JDBC tables to read data from remote databases like MySQL, PostgreSQL, or other Impala clusters.
Apache Impala now supports reading from external JDBC data sources. An external JDBC table
represents a table or a view in a remote RDBMS database or another Impala cluster. Using external
JDBC tables, you can connect Impala to a database, such as MySQL, PostgreSQL, or another Impala
cluster and read the data in the remote tables.
Prerequisites
The Impala package does not include JDBC drivers of the remote databases. Ensure that the MySQL, Postgres, and Impala JDBC drivers are uploaded to a location in HDFS, Ozone, or Amazon S3 that can be read by Impala.
Syntax
To connect to a remote database, you create an external JDBC table with the appropriate table
properties, such as the database type, JDBC URL, driver class, driver file location, JDBC
username and password, and name of the remote table to be mapped to the Impala external table.
Syntax:
CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.]table_name
(col_name data_type,
....)
STORED BY JDBC
TBLPROPERTIES (
"database.type"="[***VALUE***]",
"jdbc.url"="[***VALUE***]",
"jdbc.driver"="[***VALUE***]",
"driver.url"="[***VALUE***]",
"dbcp.username"="[***VALUE***]",
"dbcp.password"="[***VALUE***]",
"table"="[***VALUE***]");Examples:
- Creating an external JDBC table to map a table in a remote PostgreSQL
database:
CREATE EXTERNAL TABLE student_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="POSTGRES", "jdbc.url"="jdbc:postgresql://[***IP_ADDRESS***]:[***PORT***] /[***DATABASE NAME***]", "jdbc.driver"="org.postgresql.Driver", "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postg resql-jdbc.jar", "dbcp.username"="[***USERNAME***]", "dbcp.password"="[***PASSWORD***]", "table"="student"); - Creating an external JDBC table to map a table in another Impala
cluster:
CREATE EXTERNAL TABLE student_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="IMPALA", "jdbc.url"="jdbc:impala://[***IP ADDRESS***]:[***PORT***]/[***DATABASE NAME***]", "jdbc.auth"="AuthMech=3", "jdbc.properties"="MEM_LIMIT=1000000000, MAX_ERRORS = 10000", "jdbc.driver"="com.cloudera.impala.jdbc.Driver", "driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/I mpalaJDBC42.jar", "dbcp.username"="[***USERNAME***]", "dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sour ces/test.jceks", "dbcp.password.key"="[***KEY***]", "table"="student");
Table Properties
While creating an external JDBC table, you are required to specify the following table
properties:
- database.type: POSTGRES, MYSQL, IMPALA.
- jdbc.url: JDBC connection string with the required parameters — database type, hostname, port number, and database name. Example: "jdbc:impala://10.96.132.138:21050/sample_db".
- jdbc.driver: Class name of the JDBC driver.
- driver.url: URL to download the JAR file package that is used to access the external database.
- table: Name of the table in the remote database that you want to map in Impala.
Besides the above required properties, you can also specify optional parameters that allow you
to use different authentication methods, allow case sensitive column names in remote tables, or
to specify additional database properties:
- jdbc.auth: Authentication mechanism of the JDBC driver
- dbcp.username: JDBC username
- dbcp.password: JDBC password in clear text.
- dbcp.password.key: Key of the Java keystore.
- dbcp.password.keystore: Location of the keystore file
- jdbc.properties: Additional properties applied to database engines, like Impala Query options. The properties are specified as comma-separated "key-value" pairs
- jdbc.fetch.size: Number of rows to fetch in a batch
- column.mapping: Mapping of column names between external table and Impala JDBC table.
