Accessing Spark SQL through JDBC or ODBC
Using Spark Thrift server, you can remotely access Spark SQL over JDBC (using the JDBC Beeline client) or ODBC (using the Simba driver).
The following prerequisites must be met before accessing Spark SQL through JDBC or ODBC:
The Spark Thrift server must be deployed on the cluster.
For an Ambari-managed cluster, deploy and launch the Spark Thrift server using the Ambari web UI (see Installing and Configuring Spark Over Ambari).
For a cluster that is not managed by Ambari, see Starting the Spark Thrift Server in the Non-Ambari Cluster Installation Guide.
Ensure that SPARK_HOME is defined as your Spark directory:
export SPARK_HOME=/usr/hdp/current/spark-client
Before accessing Spark SQL through JDBC or ODBC, note the following caveats:
The Spark Thrift server works in YARN client mode only.
ODBC and JDBC client configurations must match Spark Thrift server configuration parameters.
For example, if the Thrift Server is configured to listen in binary mode, the client should send binary requests and use HTTP mode when the Thrift Server is configured over HTTP.
When using JDBC or ODBC to access Spark SQL in a production environment, note that the Spark Thrift server does not currently support the
doAs
authorization property, which propagates user identity.Workaround: use programmatic APIs or
spark-shell
, submitting the job under your identity.All client requests coming to Spark Thrift server share a SparkContext.
To list available Thrift Server options, run ./sbin/start-thriftserver.sh
--help
.
To manually stop the Spark Thrift server, run the following commands:
su spark
./sbin/stop-thriftserver.sh
Accessing Spark SQL through JDBC
Connect to the Thrift Server over the Beeline JDBC client.
From the SPARK_HOME directory, launch Beeline:
su spark
./bin/beeline
At the Beeline prompt, connect to the Spark SQL Thrift Server:
beeline> !connect jdbc:hive2://localhost:10015
The host port must match the host port on which the Spark Thrift server is running.
You should see output similar to the following:
beeline> !connect jdbc:hive2://localhost:10015 Connecting to jdbc:hive2://localhost:10015 Enter username for jdbc:hive2://localhost:10015: Enter password for jdbc:hive2://localhost:10015: ... Connected to: Spark SQL (version 1.6.2) Driver: Spark Project Core (version 1.6.2.2.4.0.0-169) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://localhost:10015>
When connected, issue a Spark SQL statement.
The following example executes a SHOW TABLES query:
0: jdbc:hive2://localhost:10015> show tables; +------------+--------------+--+ | tableName | isTemporary | +------------+--------------+--+ | sample_07 | false | | sample_08 | false | | testtable | false | +------------+--------------+--+ 3 rows selected (2.399 seconds) 0: jdbc:hive2://localhost:10015>
Accessing Spark SQL through ODBC
If you want to access Spark SQL through ODBC, first download the ODBC Spark driver for the operating system you want to use for the ODBC client. After downloading the driver, refer to the Hortonworks ODBC Driver with SQL Connector for Apache Spark User Guide for installatiion and configuration instructions.
Drivers and associated documentation are available in the "Hortonworks Data Platform Add-Ons" section of the Hortonworks downloads page (http://hortonworks.com/downloads/) under "Hortonworks ODBC Driver for SparkSQL." If the latest version of HDP is newer than your version, check the Hortonworks Data Platform Archive area of the add-ons section for the version of the driver that corresponds to your version of HDP.