Direct Reader configuration properties
You need to know the property names and valid values for configuring Direct Reader mode. The advantage of using Direct Reader V2 over Direct Reader V1 is its ability to process ORC data using vectorization, which improves performance.
Options
In configuration/spark-defaults.conf, or using the
--conf
option in spark-submit/spark-shell set the
following properties:
- Name: spark.sql.extensions
- Value:
com.hortonworks.spark.sql.rule.Extensions
- spark.datasource.hive.warehouse.read.mode
- DIRECT_READER_V1 or DIRECT_READER_V2
- Name: spark.kryo.registrator
- Value:
com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator
- Name: spark.hadoop.hive.metastore.uris
- Value:
thrift://<host>:<port>
- Name: --jars
- Value: HWC jar
Example: Launch a spark-shell
spark-shell --jars \
/opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-<version>.jar \
--conf "spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions" \
--conf "spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2
--conf "spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator" \
--conf "spark.hadoop.hive.metastore.uris=<metastore_uri>"