Configuring HWC in CDP Public Cloud

You use the Hive Warehouse Connector (HWC) either transparently through spark sql or by using HWC API commands. You configure HWC processing to use LLAP or not, based on your use case.

To read ACID, or other Hive-managed tables, from Spark using low-latency analytical processing (LLAP) is recommended.

Low-latency analytical processing (LLAP) is recommended for reading ACID, or other Hive-managed tables, from Spark. You do not need LLAP to write to ACID, or other managed tables, from Spark. You do not need LLAP to access external tables from Spark.

Configuring the HWC mode for reads

The HWC runs in the following modes for reading Hive-managed tables:

LLAP
- true
- false
JDBC
- cluster
- client

You need to configure the following properties in configuration/spark-defaults.conf. Alternatively, you can set the properties using the spark-submit/spark-shell --conf option.

spark.datasource.hive.warehouse.read.via.llap
Configures LLAP mode on or off. Values: true or false
spark.datasource.hive.warehouse.read.jdbc.mode
Configures JDBC mode. Values: cluster or client
spark.sql.hive.hiveserver2.jdbc.url
The Hive JDBC url in /etc/hive/conf/beeline-site.xml.
spark.datasource.hive.warehouse.metastoreUri
URI of Hive metastore. In Cloudera Manager, click Clusters > Hive-1 > Configuration, search for hive.metastore.uris, and use that value.
spark.datasource.hive.warehouse.load.staging.dir
Temporary staging location required by HWC.
Set the value to a file system location where the HWC user has write permission.

Table 1. Spark Compatibility
Tasks	Use HWC	Recommended HWC Mode
Read Hive managed tables from Spark	Yes	LLAP mode=true
Write Hive managed tables from Spark	Yes	N/A
Read Hive external tables from Spark	Ok, but unnecessary	N/A
Write Hive external tables from Spark	Ok, but unnecessary	N/A