Using JDBC read mode

In a few steps, you configure Apache Spark to connect to HiveServer (HS2). Examples show how to configure JDBC Cluster mode while launching the Spark shell.

Accept the default spark.datasource.hive.warehouse.load.staging.dir for the temporary staging location required by HWC.
In spark-defaults.conf, check that spark.hadoop.hive.zookeeper.quorum is configured.
In spark-defaults.conf, set Kerberos configurations for HWC, or for an unsecured cluster, set spark.security.credentials.hiveserver2.enabled=false.

Find the HiveServer (HS2) JDBC URL in /etc/hive/conf.cloudera.HIVE_ON_TEZ-1/beeline-site.xml

The value of beeline.hs2.jdbc.url.HIVE_ON_TEZ-1 is the HS2 JDBC URL in this sample file.

...
<configuration>
 <property>
 <name>beeline.hs2.jdbc.url.default</name>
 <value>HIVE_ON_TEZ-1</value>
 </property>
 <property>
 <name>beeline.hs2.jdbc.url.HIVE_ON_TEZ-1</name>
 <value>jdbc:hive2://<domain name>:2181/;serviceDiscoveryMode=zooKeeper; \
    zooKeeperNamespace=hiveserver2;retries=5</value>
 </property>
</configuration>

Launch the Spark shell, including the configuration of the JDBC cluster option, and setting the Spark property to the value of the HS2 JDBC URL.

For example:

spark-shell --jars ./hive-warehouse-connector-assembly-<version>.jar \
--master yarn \
--conf spark.sql.extensions="com.hortonworks.spark.sql.rule.Extensions" \
--conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator \
--conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hwc-2.hwc.root.hwx.site:2181/default;principal=hive/_HOST@ROOT.HWX.SITE;retries=5;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"

--conf spark.datasource.hive.warehouse.read.mode=JDBC_CLUSTER

Read a hive table.

scala> hive.sql("select * from managedTable").show(1, false)
OR
scala> spark.sql("select * from managedTable").show(1, false)