Using JDBC read mode
In a few steps, you configure Apache Spark to connect to HiveServer (HS2). Examples show how to configure JDBC Cluster and JDBC Client modes while launching the Spark shell.
- Accept the default
spark.datasource.hive.warehouse.load.staging.dir
for the temporary staging location required by HWC. - In spark-defaults.conf, check that
spark.hadoop.hive.zookeeper.quorum
is configured. - In spark-defaults.conf, set Kerberos configurations for HWC, or for an unsecured
cluster, set
spark.security.credentials.hiveserver2.enabled
=false
.
Example of JDBC Client Mode
Launch the Spark shell in JDBC client mode.
spark-shell --jars ./hive-warehouse-connector-assembly-<version>.jar \
--master yarn \
--conf spark.sql.extensions="com.hortonworks.spark.sql.rule.Extensions" \
--conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator \
--conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hwc-2.hwc.root.hwx.site:2181/default;principal=hive/_HOST@ROOT.HWX.SITE;retries=5;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"
--conf spark.datasource.hive.warehouse.read.mode=JDBC_CLIENT