Connecting to Hive tables via HWC
To access Hive from Spark, Hive Warehouse Connector (HWC) is needed. You can use the HWC to access Hive-managed tables from Spark.
spark = (
SparkSession.builder.appName(self.app_name)
.config("spark.jars", "/opt/spark/optional-lib/hive-warehouse-connector-assembly.jar")
.config("spark.sql.hive.hwc.execution.mode", "spark")
.config(
"spark.sql.extensions",
"com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension",
)
.config(
"spark.kryo.registrator",
"com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator",
)
.config("spark.sql.catalog.spark_catalog.type", "hive")
.config("spark.yarn.access.hadoopFileSystems","<DATALAKE_DIRECTORY>")
.getOrCreate()
)