Integrating Apache Hive with Apache Spark and BIPDF version

Reading managed tables through HWC

A step-by-step procedure walks you through choosing one mode or another, starting the Apache Spark session, and executing a read of Apache Hive ACID, managed tables.

  • Configure Spark Direct Reader Mode or JDBC execution mode.
  • Set Kerberos for HWC.
  1. Choose a configuration based on your execution mode.
    • Spark Direct Reader mode:
      --conf spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension
    • JDBC mode:
      --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions
      --conf spark.datasource.hive.warehouse.read.via.llap=false

      Also set a location for running the application in JDBC mode. For example, set the recommended cluster location for example:
      spark.datasource.hive.warehouse.read.jdbc.mode=cluster
  2. Start the Spark session using the execution mode you chose in the last step.
    For example, start the Spark session using Spark Direct Reader mode and configure for kyro serialization:
    sudo -u hive spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-<version>.jar \
    --conf "spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension" \
    --conf spark.kryo.registrator="com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator" 

    For example, start the Spark session using JDBC execution mode:

    sudo -u hive spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly-<version>.jar \
    --conf spark.sql.hive.hwc.execution.mode=spark \
    --conf spark.datasource.hive.warehouse.read.via.llap=false                
    You must start the Spark session after setting Spark Direct Reader mode, so include the configurations in the launch string.
  3. Read Apache Hive managed tables.
    For example:
    scala> sql("select * from managedTable").show
    scala> spark.read.table("managedTable").show