Using JDBC read mode

In a few steps, you configure Apache Spark to connect to HiveServer (HS2). Examples show how to configure JDBC Cluster mode while launching the Spark shell.

  • Accept the default spark.datasource.hive.warehouse.load.staging.dir for the temporary staging location required by HWC.
  • In spark-defaults.conf, check that spark.hadoop.hive.zookeeper.quorum is configured.
  • In spark-defaults.conf, set Kerberos configurations for HWC, or for an unsecured cluster, set
  1. Find the HiveServer (HS2) JDBC URL in /etc/hive/conf.cloudera.HIVE_ON_TEZ-1/beeline-site.xml
    The value of beeline.hs2.jdbc.url.HIVE_ON_TEZ-1 is the HS2 JDBC URL in this sample file.
     <value>jdbc:hive2://<domain name>:2181/;serviceDiscoveryMode=zooKeeper; \
  2. Launch the Spark shell, including the configuration of the JDBC cluster option, and setting the Spark property to the value of the HS2 JDBC URL.
    For example:
    spark-shell --jars ./hive-warehouse-connector-assembly-<version>.jar \
    --master yarn \
    --conf spark.sql.extensions="com.hortonworks.spark.sql.rule.Extensions" \
    --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator \
    --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://;principal=hive/_HOST@ROOT.HWX.SITE;retries=5;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"
  3. Read a hive table.
    scala> hive.sql("select * from managedTable").show(1, false)
    scala> spark.sql("select * from managedTable").show(1, false)