Using the JdbcStorageHandler, you can connect Hive to a MySQL, PostgreSQL, Oracle, or
Derby data source, create an external table to represent the data, and then query the table.
In this task you create an external table that uses the JdbcStorageHandler to
connect to and read a local JDBC data source.
-
Load data into a supported SQL database, such as MySQL, on a node in your
cluster or familiarize yourself with existing data in the your database.
-
Obtain credentials for accessing the database.
- If you are accessing data on your network and have no security worries, you can use the user name and password that authorizes your to access Hive.
- If you are accessing data on a remote network, create a JCEKS credential
keystore on HDFS, and use credentials you specify in the process.
-
Create an external table using the JdbcStorageHandler and table properties that
specify the minimum information: database type, driver, database connection
string, user name and password for querying hive, table name, and number of
active connections to Hive.
CREATE EXTERNAL TABLE mytable_jdbc(
col1 string,
col2 int,
col3 double
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
"hive.sql.database.type" = "MYSQL",
"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",
"hive.sql.jdbc.url" = "jdbc:mysql://localhost/sample",
"hive.sql.dbcp.username" = "hive",
"hive.sql.dbcp.password" = "hive",
"hive.sql.table" = "MYTABLE",
"hive.sql.dbcp.maxActive" = "1"
);
-
Query the external table.
SELECT * FROM mytable_jdbc WHERE col2 = 19;