Accessing Spark SQL through JDBC
To access Spark SQL through JDBC, you need a JDBC URL connection string to supply connection information to the JDBC data source. Connection strings for the Spark SQL JDBC driver have the following format:
jdbc:hive2://<host>:<port>/<dbName>;<sessionConfs>?<hiveConfs>#<hiveVars>
JDBC Parameter | Description |
---|---|
host | The node hosting the Thrift server |
port | The port number on which the Thrift server listens |
dbName | The name of the Hive database to run the query against |
sessionConfs | Optional configuration parameters for the JDBC or ODBC driver in the following
format: <key1>=<value1>;<key2>=<key2>...; |
hiveConfs |
Optional configuration parameters for Hive on the server in the following
format: These settings last for the duration of the user session. |
hiveVars |
Optional configuration parameters for Hive variables in the following format:
These settings persist for the duration of the user session. |
Note | |
---|---|
The Spark Thrift server is a variant of HiveServer2, so you can use many of the same settings. For more information about JDBC connection strings, including transport and security settings, see Hive JDBC and ODBC Drivers in the HDP Data Access Guide. |
The following connection string accesses Spark SQL through JDBC on a Kerberos-enabled cluster:
beeline> !connect jdbc:hive2://localhost:10002/default;httpPath=/;principal=hive/hdp-team.example.com@EXAMPLE.COM
The following connection string accesses Spark SQL through JDBC over HTTP transport on a Kerberos-enabled cluster:
beeline> !connect jdbc:hive2://localhost:10002/default;transportMode=http;httpPath=/;principal=hive/hdp-team.example.com@EXAMPLE.COM
To access Spark SQL, complete the following steps:
Connect to the Thrift server over the Beeline JDBC client.
From the SPARK_HOME directory, launch Beeline:
su spark ./bin/beeline
At the Beeline prompt, connect to the Spark SQL Thrift server with the JDBC connection string:
beeline> !connect jdbc:hive2://localhost:10015
The host port must match the host port on which the Spark Thrift server is running.
You should see output similar to the following:
beeline> !connect jdbc:hive2://localhost:10015 Connecting to jdbc:hive2://localhost:10015 Enter username for jdbc:hive2://localhost:10015: Enter password for jdbc:hive2://localhost:10015: ... Connected to: Spark SQL (version 1.6.3) Driver: Spark Project Core (version 1.6.3.2.4.0.0-169) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:10015>
When connected, issue a Spark SQL statement.
The following example executes a SHOW TABLES query:
0: jdbc:hive2://localhost:10015> show tables; +------------+--------------+--+ | tableName | isTemporary | +------------+--------------+--+ | sample_07 | false | | sample_08 | false | | testtable | false | +------------+--------------+--+ 3 rows selected (2.399 seconds) 0: jdbc:hive2://localhost:10015>