Integrating Apache Hive with Spark and KafkaPDF version

Configuring caching for secure access mode

You can enable or disable caching for the Hive Warehouse Connector (HWC) secure access mode to have finer control over read queries and ensure that the content updated outside of a Spark session is considered during reads. Caching is enabled by default because queries that run with caching enabled tend to run faster.

You must ensure that you are using the HWC secure access mode for reads: spark.datasource.hive.warehouse.read.mode=secure_access
Caching is enabled by default, however, you can choose to disable caching by setting the spark.hadoop.secure.access.cache.disable property to true either at a global-level, session-level, or at a runtime-level when running a query. Queries that run with caching disabled tend to run slower.
  • Global-level — Specify the property in the spark-defaults.conf file.
  • Session-level — Specify the property using the --conf option in spark-submit:
    .bin/spark-submit \
    --conf "spark.hadoop.secure.access.cache.disable=true"
  • Runtime-level — Specify the property just before running your queries:
    scala> spark.conf.set("spark.hadoop.secure.access.cache.disable","true")
    scala> hive.sql("select * from anytable").show
The order of preference for configuration is as follows:
  1. Property passed at a run-time level
  2. Property passed at a Spark session-level
  3. Property set in the spark-defaults.conf file