Apache Spark Component Guide
Also available as:
PDF
loading table of contents...

Accessing Spark SQL through JDBC

To access Spark SQL through JDBC, you need a JDBC URL connection string to supply connection information to the JDBC data source. Connection strings for the Spark SQL JDBC driver have the following format:

jdbc:hive2://<host>:<port>/<dbName>;<sessionConfs>?<hiveConfs>#<hiveVars>

JDBC ParameterDescription
hostThe node hosting the Thrift server
portThe port number on which the Thrift server listens
dbNameThe name of the Hive database to run the query against
sessionConfsOptional configuration parameters for the JDBC or ODBC driver in the following format: <key1>=<value1>;<key2>=<key2>...;
hiveConfs

Optional configuration parameters for Hive on the server in the following format: <key1>=<value1>;<key2>=<key2>; ...

These settings last for the duration of the user session.

hiveVars

Optional configuration parameters for Hive variables in the following format: <key1>=<value1>;<key2>=<key2>; ...

These settings persist for the duration of the user session.

[Note]Note

The Spark Thrift server is a variant of HiveServer2, so you can use many of the same settings. For more information about JDBC connection strings, including transport and security settings, see Hive JDBC and ODBC Drivers in the HDP Data Access Guide.

The following connection string accesses Spark SQL through JDBC on a Kerberos-enabled cluster:

beeline> !connect jdbc:hive2://localhost:10002/default;httpPath=/;principal=hive/hdp-team.example.com@EXAMPLE.COM

The following connection string accesses Spark SQL through JDBC over HTTP transport on a Kerberos-enabled cluster:

beeline> !connect jdbc:hive2://localhost:10002/default;transportMode=http;httpPath=/;principal=hive/hdp-team.example.com@EXAMPLE.COM

To access Spark SQL, complete the following steps:

  1. Connect to the Thrift server over the Beeline JDBC client.

    1. From the SPARK_HOME directory, launch Beeline:

      su spark
      ./bin/beeline
    2. At the Beeline prompt, connect to the Spark SQL Thrift server with the JDBC connection string:

      beeline> !connect jdbc:hive2://localhost:10015

      The host port must match the host port on which the Spark Thrift server is running.

      You should see output similar to the following:

      beeline> !connect jdbc:hive2://localhost:10015
      Connecting to jdbc:hive2://localhost:10015
      Enter username for jdbc:hive2://localhost:10015:
      Enter password for jdbc:hive2://localhost:10015:
      ...
      Connected to: Spark SQL (version 1.6.3)
      Driver: Spark Project Core (version 1.6.3.2.4.0.0-169)
      Transaction isolation: TRANSACTION_REPEATABLE_READ
      0: jdbc:hive2://localhost:10015>
  2. When connected, issue a Spark SQL statement.

    The following example executes a SHOW TABLES query:

    0: jdbc:hive2://localhost:10015> show tables;
    +------------+--------------+--+
    | tableName  | isTemporary  |
    +------------+--------------+--+
    | sample_07  | false        |
    | sample_08  | false        |
    | testtable  | false        |
    +------------+--------------+--+
    3 rows selected (2.399 seconds)
    0: jdbc:hive2://localhost:10015>