Example of configuring and reading a Hive managed table🔗
- Choose a read mode.
-
Start the Spark session using the following configurations.
For example, start the Spark session using Direct Reader and configure for kyro serialization:
spark-shell --jars ./hive-warehouse-connector-assembly-<version>.jar \ --master yarn \ --conf spark.sql.extensions="com.hortonworks.spark.sql.rule.Extensions" \ --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator \ --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hwc-2.hwc.root.hwx.site:2181/default;retries=5;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" \ --conf spark.sql.hive.hiveserver2.jdbc.url.principal=hive/_HOST@ROOT.HWX.SITE \ --conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2
For example, start the Spark session using the JDBC_CLUSTER option:
spark-shell --jars ./hive-warehouse-connector-assembly-<version>.jar --master yarn --conf spark.sql.extensions="com.hortonworks.spark.sql.rule.Extensions" --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hwc-2.hwc.root.hwx.site:2181/default;retries=5;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" --conf spark.sql.hive.hiveserver2.jdbc.url.prinicpal=hive/_HOST@ROOT.HWX.SITE --conf spark.datasource.hive.warehouse.read.mode=JDBC_CLUSTER
You must start the Spark session after setting the Direct Read option, so include the configurations in the launch string. -
Read Apache Hive managed tables.
For example:
scala> val hive = com.hortonworks.hwc.HiveWarehouseSession.session(spark).build() scala> hive.sql("select * from managedTable").show