Spark Guide
Also available as:
PDF
loading table of contents...

Accessing Spark SQL through JDBC

With the use of the Spark Thrift Server, remote access to Spark SQL is possible over JDBC. The following considerations and prerequisites apply to JDBC access.

Considerations:

  • The Spark Thrift Server works in YARN client mode only.

  • JDBC client configurations must match Spark Thrift Server configuration parameters. For example, if the Thrift Server is configured to listen in binary mode, the client should send binary requests and use HTTP mode when the Thrift Server is configured over HTTP.

  • When using JDBC to access Spark SQL in a production environment, note that the Spark Thrift Server does not currently support the doAs authorization property, which propagates user identity. Workaround: use programmatic APIs or spark-shell, submitting the job under your identity.

  • All client requests coming to Spark Thrift Server share a SparkContext.

Prerequisites:

  • The Spark Thrift Server must be deployed on the cluster.

  • If SPARK_HOME is not already defined, set it to your Spark directory. For example:

    export SPARK_HOME=/usr/hdp/current/spark-client

To list available Thrift Server options, run ./sbin/start-thriftserver.sh --help.

To manually stop the Spark Thrift Server:

su spark

./sbin/stop-thriftserver.sh

To access Spark SQL through JDBC:

  1. Connect to the Thrift Server over the Beeline JDBC client.

    1. Launch Beeline from SPARK_HOME:

      su spark

      ./bin/beeline

    2. On the Beeline prompt, connect to the Spark SQL Thrift Server:

      beeline> !connect jdbc:hive2://localhost:10015

      The host port must match the host port where the Spark Thrift Server is running.

      You should see output similar to the following:

      beeline> !connect jdbc:hive2://localhost:10015
      Connecting to jdbc:hive2://localhost:10015
      Enter username for jdbc:hive2://localhost:10015:
      Enter password for jdbc:hive2://localhost:10015:
      ...
      Connected to: Spark SQL (version 1.6.0)
      Driver: Spark Project Core (version 1.6.0.2.4.0.0-169)
      Transaction isolation: TRANSACTION_REPEATABLE_READ
      0: jdbc:hive2://localhost:10015>
  2. Once connected, issue a Spark SQL statement.

    The following example executes a SHOW TABLES query:

    0: jdbc:hive2://localhost:10015> show tables;
    +------------+--------------+--+
    | tableName | isTemporary |
    +------------+--------------+--+
    | sample_07 | false |
    | sample_08 | false |
    | testtable | false |
    +------------+--------------+--+
    3 rows selected (2.399 seconds)
    0: jdbc:hive2://localhost:10015>