Direct Reader configuration properties
You need to know the property names and valid values for configuring Direct Reader mode. The Direct Reader V2 configuration processes ORC data using vectorization, which improves performance.
Options
In configuration/spark-defaults.conf, or using the
--conf
option in spark-submit/spark-shell set the
following properties:
- Name: spark.sql.extensions
- Value:
com.hortonworks.spark.sql.rule.Extensions
- spark.datasource.hive.warehouse.read.mode
- DIRECT_READER_V1 or DIRECT_READER_V2
- Name: spark.kryo.registrator
- Value:
com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator
- Name: spark.hadoop.hive.metastore.uris
- Value:
thrift://<host>:<port>
- Name: --jars
- Value: HWC jar
Example: Launch a spark-shell
spark-shell --jars \
/opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-<version>.jar \
--conf "spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions" \
--conf "spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2
--conf "spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator" \
--conf "spark.hadoop.hive.metastore.uris=<metastore_uri>"