Using JdbcStorageHandler to query RDBMS

Using the JdbcStorageHandler, you can connect Apache Hive to a MySQL, PostgreSQL, Oracle, DB2, or Derby data source. You can then create an external table to represent the data, and query the table.

This task assumes you are a Cloudera Private Cloud Base user. You create an external table that uses the JdbcStorageHandler to connect to and read a local JDBC data source.

  1. Load data into a supported SQL database, such as MySQL, on a node in your cluster, or familiarize yourself with existing data in the your database.
  2. Create an external table using the JdbcStorageHandler and table properties that specify the minimum information: database type, driver, database connection string, user name and password for querying hive, table name, and number of active connections to Hive.
    CREATE EXTERNAL TABLE mytable_jdbc(
      col1 string,
      col2 int,
      col3 double
    )
    STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
    TBLPROPERTIES (
      "hive.sql.database.type" = "MYSQL",
      "hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
      "hive.sql.jdbc.url" = "jdbc:mysql://localhost/sample",
      "hive.sql.dbcp.username" = "hive",
      "hive.sql.dbcp.password" = "hive",
      "hive.sql.table" = "MYTABLE",
      "hive.sql.dbcp.maxActive" = "1"
    );
  3. Query the external table.
    SELECT * FROM mytable_jdbc WHERE col2 = 19;