Reading data through HWC
You can configure one of the several HWC modes to read Apache Hive managed tables from Apache Spark. You need to know about the modes you can configure for querying Hive from Spark. Examples of how to configure the modes are presented.
In this release, HWC configuration has been simplified.
You set the following configurations when starting the spark shell:
- spark.sql.extensions="com.hortonworks.spark.sql.rule.Extensions"
- spark.datasource.hive.warehouse.read.mode=<mode>
where <mode> is one of the following:
- DIRECT_READER_V1
- JDBC_CLUSTER
- JDBC_CLIENT
spark.sql("<query>")
. For backward compatibility,
configuring spark.datasource.hive.warehouse.read.mode
is the same
as the following configurations. --conf spark.datasource.hive.warehouse.read.jdbc.mode
//deprecated--conf spark.sql.hive.hwc.execution.mode
//deprecated--conf spark.datasource.hive.warehouse.read.via.llap
//deprecated
Now, you can transparently read with HWC in different modes using just spark.sql("<query>").
The old configurations are still supported for backward compatibility, but in a later
release, support will end for these configurations and
spark.datasource.hive.warehouse.read.mode
will replace these
configurations. HWC gives precedence to new configurations when old and new ones are
encountered.
You can specify the mode in the spark-shell when you run Spark SQL commands to query
Apache Hive tables from Apache Spark. You can also specify the mode in
configuration/spark-defaults.conf, or using the
--conf
option in spark-submit.