In two steps, you configure Apache Spark to connect to HiveServer (HS2). An example
shows how to configure this mode while launching the Spark shell.
- Accept the default and recommended
spark.datasource.hive.warehouse.read.jdbc.mode=cluster
for
the location of query execution.
- Accept the default
spark.datasource.hive.warehouse.load.staging.dir
for the
temporary staging location required by HWC.
- Check that
spark.hadoop.hive.zookeeper.quorum
is
configured.
- Set Kerberos configurations for HWC, or for an unsecured cluster, set
spark.security.credentials.hiveserver2.enabled
=false
.
-
Find the HiveServer (HS2) JDBC URL in
/etc/hive/conf.cloudera.HIVE_ON_TEZ-1/beeline-site.xml
The value of beeline.hs2.jdbc.url.HIVE_ON_TEZ-1 is the HS2 JDBC URL in this
sample
file.
...
<configuration>
<property>
<name>beeline.hs2.jdbc.url.default</name>
<value>HIVE_ON_TEZ-1</value>
</property>
<property>
<name>beeline.hs2.jdbc.url.HIVE_ON_TEZ-1</name>
<value>jdbc:hive2://nightly7x-unsecure-1.nightly7x-unsecure.root.hwx.site:2181/;serviceDiscoveryMode=zooKeeper; \
zooKeeperNamespace=hiveserver2;retries=5</value>
</property>
</configuration>
-
Set the Spark property to the value of the HS2 JDBC URL.
For example, in
/opt/cloudera/parcels/CDH-7.2.1-1.cdh7.2.1.p0.4847773/etc/spark/conf.dist/spark-defaults.conf
,
add the JDBC URL:
...
spark.sql.hive.hiveserver2.jdbc.url spark.sql.hive.hiveserver2.jdbc.url jdbc:hive2://nightly7x-unsecure-1.nightly7x-unsecure.root.hwx.site:2181/;serviceDiscoveryMode=zooKeeper; \
zooKeeperNamespace=hiveserver2;retries=5