Syntax to create an external JDBC table
To connect to a remote database, you create an external JDBC table with the appropriate table properties, such as the database type, JDBC URL, driver class, driver file location, JDBC username and password, and name of the remote table to be mapped to the Impala external JDBC table.
Prerequisites
The Impala package does not include JDBC drivers of the remote databases. Ensure that the MySQL, Postgres, and Impala JDBC drivers are uploaded to a location in HDFS, Ozone, or Amazon S3 that can be read by Impala.
Syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.]table_name
(col_name data_type,
....)
STORED BY JDBC
TBLPROPERTIES (
"database.type"="[***VALUE***]",
"jdbc.url"="[***VALUE***]",
"jdbc.driver"="[***VALUE***]",
"driver.url"="[***VALUE***]",
"dbcp.username"="[***VALUE***]",
"dbcp.password"="[***VALUE***]",
"table"="[***VALUE***]");
Examples
- Creating an external JDBC table to map a table in a remote PostgreSQL database
-
CREATE EXTERNAL TABLE student_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="POSTGRES", "jdbc.url"="jdbc:postgresql://[***IP ADDRESS***]:5432/[***DATABASE NAME***]", "jdbc.driver"="org.postgresql.Driver", "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar", "dbcp.username"="[***USERNAME***]", "dbcp.password"="[***PASSWORD***]", "table"="student"); - Creating an external JDBC table to map a table in another Impala cluster
-
CREATE EXTERNAL TABLE student_jdbc ( id INT, bool_col BOOLEAN, tinyint_col TINYINT, smallint_col SMALLINT, int_col INT, bigint_col BIGINT, float_col FLOAT, double_col DOUBLE, date_col DATE, string_col STRING, timestamp_col TIMESTAMP) STORED BY JDBC TBLPROPERTIES ( "database.type"="IMPALA", "jdbc.url"="jdbc:impala://[***IP ADDRESS***]:21050/[***DATABASE NAME***]", "jdbc.auth"="AuthMech=3", "jdbc.properties"="MEM_LIMIT=1000000000, MAX_ERRORS = 10000", "jdbc.driver"="com.cloudera.impala.jdbc.Driver", "driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar", "dbcp.username"="[***USERNAME***]", "dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks", "dbcp.password.key"="[***KEY***]", "table"="student");
Table properties
While creating an external JDBC table, you are required to specify the following table properties:
database.type: POSTGRES, MYSQL, or IMPALAjdbc.url: JDBC connection string with the required parameters — database type, hostname/IP address, port number, and database nameExample:
"jdbc:impala://10.96.132.138:21050/sample_db"jdbc.driver: Class name of the JDBC driverdriver.url: URL to download the JAR file package that is used to access the external databasetable: Name of the table in the remote database that you want to map in Impala
Besides the above required properties, you can also specify optional parameters that allow you to use different authentication methods, allow case sensitive column names in remote tables, or to specify additional database properties:
jdbc.auth: Authentication mechanism of the JDBC driverdbcp.username: JDBC usernamedbcp.password: JDBC password in clear textdbcp.password.key: Key of the Java keystoredbcp.password.keystore: URI of the keystore filejdbc.properties: Additional properties applied to database engines, like Impala Query options. The properties are specified as comma-separated "key-value" pairsjdbc.fetch.size: Number of rows to fetch in a batchcolumn.mapping: Mapping of column names between external table and Impala JDBC table
Supported data types
The following column data types are supported for an Impala external JDBC table:
- Numeric data type: boolean, tinyint, smallint, int, bigint, float, double
- Decimal with scale and precision
- String type: string
- Date
- Timestamp
Limitations
You must be aware of the following limitations while using Impala external JDBC tables:
- Following column data types are not supported: char, varchar, binary, and complex data types, such as struct, map, array, and nested type
- JDBC tables have to be defined one table at a time
- Writing to a JDBC table is not supported
- Only supported binary predicates with operators =, !=, <=, >=, <, > to be pushed to RDBMS
Support for case-sensitive table and column names
The column names of tables in the remote database can be different from the external JDBC
table schema. For example, Postgres allows case-sensitive column names, however, Impala saves
column names in lowercase. In such situations, you can set the column.mapping
table property to map column names between Impala external JDBC tables and the remote
tables.
"column.mapping"="id=id, bool_col=Bool_col, tinyint_col=Tinyint_col, smallint_col=Smallint_col, int_col=Int_col, bigint_col=Bigint_col, float_col=Float_col, double_col=Double_col, date_col=date_col, string_col=String_col, timestamp=Timestamp");clean_dbcp_ds_cache query option
A new query option, CLEAN_DBCP_DS_CACHE is added to save the DBCP SQL
DataSource objects in the cache for a longer period of time. This allows the DBCP connection
pools to be reused across multiple queries. When the value is set to false, the DBCP SQL
DataSource object is not closed when its reference count is 0. The SQL DataSource object is kept
in cache until the object is idle for more than 5 minutes.
Type: BOOLEAN
Default: True (1)
