A step-by-step procedure walks you through choosing one mode or another, starting the
Apache Spark session, and executing a read of Apache Hive ACID, managed tables.
- Configure Spark Direct Reader Mode or JDBC execution mode.
- Set Kerberos for HWC.
-
Choose a configuration based on your execution mode.
- Spark Direct Reader mode:
--conf spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension
- JDBC mode:
--conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions
--conf spark.datasource.hive.warehouse.read.via.llap=false
Also
set a location for running the application in JDBC mode. For example, set
the recommended cluster location for example:
spark.datasource.hive.warehouse.read.jdbc.mode=cluster
-
Start the Spark session using the execution mode you chose in the last step.
For example, start the Spark session using Spark Direct Reader mode and
configure for kyro serialization:
sudo -u hive spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-<version>.jar \
--conf "spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension" \
--conf spark.kryo.registrator="com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator"
For
example, start the Spark session using JDBC execution mode:
sudo -u hive spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-<version>.jar \
--conf spark.sql.hive.hwc.execution.mode=spark \
--conf spark.datasource.hive.warehouse.read.via.llap=false
You must start the Spark session after setting Spark Direct Reader mode, so
include the configurations in the launch string.
-
Read Apache Hive managed tables.
For example:
scala> sql("select * from managedTable").show
scala> spark.read.table("managedTable").show