Connecting to Hive tables via HWC

To access Hive from Spark, Hive Warehouse Connector (HWC) is needed. You can use the HWC to access Hive-managed tables from Spark.

spark = (
            SparkSession.builder.appName(self.app_name)
            .config("spark.jars", "/opt/spark/optional-lib/hive-warehouse-connector-assembly.jar")
            .config("spark.sql.hive.hwc.execution.mode", "spark")
            .config(
                "spark.sql.extensions",
             "com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension",
            )
            .config(
                "spark.kryo.registrator",
                "com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator",
            )
            .config("spark.sql.catalog.spark_catalog.type", "hive")
             .config("spark.yarn.access.hadoopFileSystems","<DATALAKE_DIRECTORY>")
            .getOrCreate()
        )