Reading managed tables through HWC

A step-by-step procedure walks you through choosing one mode or another, starting the Apache Spark session, and executing a read of Apache Hive ACID, managed tables.

  • Configure Spark Direct Reader Mode or JDBC execution mode.
  • Set Kerberos for HWC.
  1. Choose a configuration based on your execution mode.
    • Spark Direct Reader mode:
      --conf spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension
    • JDBC mode:
      --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions

      Also set a location for running the application in JDBC mode. For example, set the recommended cluster location for example:
  2. Start the Spark session using the execution mode you chose in the last step.
    For example, start the Spark session using Spark Direct Reader mode and configure for kyro serialization:
    sudo -u hive spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly- \
    --conf "spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension" \
    --conf spark.kryo.registrator="com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator" 

    For example, start the Spark session using JDBC execution mode:

    sudo -u hive spark-shell --jars /opt/cloudera/parcels/CDH/jars/hive-warehouse-connector-assembly- \
    --conf spark.sql.hive.hwc.execution.mode=spark \
    You must start the Spark session after setting Spark Direct Reader mode, so include the configurations in the launch string.
  3. Read Apache Hive managed tables.
    For example:
    scala> sql("select * from managedTable").show