Reading data through HWC
You can configure one of the several HWC modes to read Apache Hive managed tables from Apache Spark. You need to know about the modes you can configure for querying Hive from Spark. Examples of how to configure the modes are presented.
In this release, HWC configuration has been simplified.
You set the following configurations when starting the spark shell:
- spark.sql.extensions="com.hortonworks.spark.sql.rule.Extensions"
- spark.datasource.hive.warehouse.read.mode=<mode>
where <mode> is one of the following:
- DIRECT_READER_V1 or DIRECT_READER_V2
- JDBC_CLUSTER
- JDBC_CLIENT
spark.sql("<query>")
. You can specify the mode in the
spark-shell when you run Spark SQL commands to query Apache Hive tables from Apache
Spark. You can also specify the mode in
configuration/spark-defaults.conf, or using the
--conf
option in spark-submit. spark.datasource.hive.warehouse.read.mode
is the same as the
following configurations. --conf spark.datasource.hive.warehouse.read.jdbc.mode
//deprecated--conf spark.sql.hive.hwc.execution.mode
//deprecated--conf spark.datasource.hive.warehouse.read.via.llap
//deprecated
The old configurations are still supported for backward compatibility, but in a later
release, support will end for these configurations and
spark.datasource.hive.warehouse.read.mode
will replace these
configurations. HWC gives precedence to new configurations when old and new ones are
encountered.