Reading managed tables through HWC

A step-by-step procedure walks you through choosing one mode or another, starting the Apache Spark session, and executing a read of Apache Hive ACID, managed tables.

Configure Spark Direct Reader Mode or JDBC execution mode.
Set Kerberos for HWC.

Choose a configuration based on your execution mode.

Spark Direct Reader mode:

--conf spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension

JDBC mode:

--conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions
--conf spark.datasource.hive.warehouse.read.via.llap=false

Also set a location for running the application in JDBC mode. For example, set the recommended cluster location for example:

spark.datasource.hive.warehouse.read.jdbc.mode=cluster

Start the Spark session using the execution mode you chose in the last step.

For example, start the Spark session using Spark Direct Reader mode and configure for kyro serialization:

sudo -u hive spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-1.0.0.7.2.1.0-327.jar \
--conf "spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension" \
--conf spark.kryo.registrator="com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator"

For example, start the Spark session using JDBC execution mode:

sudo -u hive spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-1.0.0.7.2.1.0-327.jar \
--conf spark.sql.hive.hwc.execution.mode=spark \
--conf spark.datasource.hive.warehouse.read.via.llap=false

You must start the Spark session after setting Spark Direct Reader mode, so include the configurations in the launch string.

Read Apache Hive managed tables.

For example:

scala> sql("select * from managedTable").show
scala> spark.read.table("managedTable").show

Reading managed tables through HWC

We want your opinion

How can we improve this page?