Syntax to create an external JDBC table

To connect to a remote database, you create an external JDBC table with the appropriate table properties, such as the database type, JDBC URL, driver class, driver file location, JDBC username and password, and name of the remote table to be mapped to the Impala external JDBC table.

Prerequisites

The Impala package does not include JDBC drivers of the remote databases. Ensure that the MySQL, Postgres, and Impala JDBC drivers are uploaded to a location in HDFS, Ozone, or Amazon S3 that can be read by Impala.

Syntax

CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.]table_name
  (col_name data_type,
   ....)
  STORED BY JDBC
  TBLPROPERTIES (
    "database.type"="[***VALUE***]",
    "jdbc.url"="[***VALUE***]",
    "jdbc.driver"="[***VALUE***]",
    "driver.url"="[***VALUE***]",
    "dbcp.username"="[***VALUE***]",
    "dbcp.password"="[***VALUE***]",
    "table"="[***VALUE***]");

Examples

Creating an external JDBC table to map a table in a remote PostgreSQL database
CREATE EXTERNAL TABLE student_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
STORED BY JDBC
TBLPROPERTIES (
  "database.type"="POSTGRES",
  "jdbc.url"="jdbc:postgresql://[***IP ADDRESS***]:5432/[***DATABASE NAME***]",
  "jdbc.driver"="org.postgresql.Driver",
  "driver.url"="/test-warehouse/data-sources/jdbc-drivers/postgresql-jdbc.jar",
  "dbcp.username"="[***USERNAME***]",
  "dbcp.password"="[***PASSWORD***]",
  "table"="student");
Creating an external JDBC table to map a table in another Impala cluster
CREATE EXTERNAL TABLE student_jdbc (
  id INT,
  bool_col BOOLEAN,
  tinyint_col TINYINT,
  smallint_col SMALLINT,
  int_col INT,
  bigint_col BIGINT,
  float_col FLOAT,
  double_col DOUBLE,
  date_col DATE,
  string_col STRING,
  timestamp_col TIMESTAMP)
STORED BY JDBC
TBLPROPERTIES (
  "database.type"="IMPALA",
  "jdbc.url"="jdbc:impala://[***IP ADDRESS***]:21050/[***DATABASE NAME***]",
  "jdbc.auth"="AuthMech=3",
  "jdbc.properties"="MEM_LIMIT=1000000000, MAX_ERRORS = 10000",
  "jdbc.driver"="com.cloudera.impala.jdbc.Driver",
  "driver.url"="hdfs://test-warehouse/data-sources/jdbc-drivers/ImpalaJDBC42.jar",
  "dbcp.username"="[***USERNAME***]",
  "dbcp.password.keystore"="jceks://hdfs/test-warehouse/data-sources/test.jceks",
  "dbcp.password.key"="[***KEY***]",
  "table"="student");

Table properties

While creating an external JDBC table, you are required to specify the following table properties:

  • database.type: POSTGRES, MYSQL, or IMPALA
  • jdbc.url: JDBC connection string with the required parameters — database type, hostname/IP address, port number, and database name

    Example: "jdbc:impala://10.96.132.138:21050/sample_db"

  • jdbc.driver: Class name of the JDBC driver
  • driver.url: URL to download the JAR file package that is used to access the external database
  • table: Name of the table in the remote database that you want to map in Impala

Besides the above required properties, you can also specify optional parameters that allow you to use different authentication methods, allow case sensitive column names in remote tables, or to specify additional database properties:

  • jdbc.auth: Authentication mechanism of the JDBC driver
  • dbcp.username: JDBC username
  • dbcp.password: JDBC password in clear text
  • dbcp.password.key: Key of the Java keystore
  • dbcp.password.keystore: URI of the keystore file
  • jdbc.properties: Additional properties applied to database engines, like Impala Query options. The properties are specified as comma-separated "key-value" pairs
  • jdbc.fetch.size: Number of rows to fetch in a batch
  • column.mapping: Mapping of column names between external table and Impala JDBC table

Supported data types

The following column data types are supported for an Impala external JDBC table:

  • Numeric data type: boolean, tinyint, smallint, int, bigint, float, double
  • Decimal with scale and precision
  • String type: string
  • Date
  • Timestamp

Limitations

You must be aware of the following limitations while using Impala external JDBC tables:

  • Following column data types are not supported: char, varchar, binary, and complex data types, such as struct, map, array, and nested type
  • JDBC tables have to be defined one table at a time
  • Writing to a JDBC table is not supported
  • Only supported binary predicates with operators =, !=, <=, >=, <, > to be pushed to RDBMS

Support for case-sensitive table and column names

The column names of tables in the remote database can be different from the external JDBC table schema. For example, Postgres allows case-sensitive column names, however, Impala saves column names in lowercase. In such situations, you can set the column.mapping table property to map column names between Impala external JDBC tables and the remote tables.

Example:
"column.mapping"="id=id, bool_col=Bool_col, tinyint_col=Tinyint_col, smallint_col=Smallint_col, int_col=Int_col, bigint_col=Bigint_col, float_col=Float_col, double_col=Double_col, date_col=date_col, string_col=String_col, timestamp=Timestamp");

clean_dbcp_ds_cache query option

A new query option, CLEAN_DBCP_DS_CACHE is added to save the DBCP SQL DataSource objects in the cache for a longer period of time. This allows the DBCP connection pools to be reused across multiple queries. When the value is set to false, the DBCP SQL DataSource object is not closed when its reference count is 0. The SQL DataSource object is kept in cache until the object is idle for more than 5 minutes.

Type: BOOLEAN

Default: True (1)