JDBC execution mode

You need to understand how JDBC mode interacts with Apache Hive components to write Hive tables from Spark through HWC. Where your queries are executed affects configuration. Understanding execution locations and recommendations help you configure JDBC execution mode for your use case.

Component Interaction

JDBC mode creates only one JDBC connection to HiveServer (HS2) or HiveServer Interactive (HSI), a potential bottleneck in data transfer to Spark. The following diagram shows interaction in JDBC mode with Hive metastore (HMS), the TEZ or LLAP daemons, and HDFS.

HWC writes to an intermediate location from Spark, and then executes a LOAD DATA query to write the data.

Configuration

In JDBC mode, execution takes place in these locations:

  • Driver: Using the Hive JDBC url, connects to Hive and executes the query on the driver side.
  • Cluster: From Spark executors, connects to Hive through JDBC and executes the query.

Authorization occurs on the server.

JDBC mode runs in the client or cluster:
  • Client (Driver)

    In client mode, any failures to connect to HiveServer (HS2) will not be retried.

  • Cluster (Executor)--recommended

    In cluster mode any failures to connect to HS2 will be retried automatically.

JDBC mode is not recommended for production. Where your queries are executed affects the Kerberos configurations for HWC.

In configuration/spark-defaults.conf, or using the --conf option in spark-submit/spark-shell set the following properties:

Name: spark.datasource.hive.warehouse.read.jdbc.mode
Value: client or cluster
Configures the driver location.
Name: spark.sql.hive.hiveserver2.jdbc.url
Value:
The JDBC endpoint for HiveServer. For more information, see the Apache Hive Wiki (link below). For Knox, provide the HiveServer, not Knox, endpoint.
Name: spark.datasource.hive.warehouse.load.staging.dir
Value: Temporary staging location required by HWC. Set the value to a file system location where the HWC user has write permission.
Name: spark.hadoop.hive.zookeeper.quorum

JDBC Mode Limitations

  • If you configured Auto Translate, run JDBC in cluster mode.
  • JDBC mode is not recommended for production because bottlenecks develop in data transfer to Spark.