Use Direct Reader Mode with PySpark
Make sure to update the following parameters in the code sample below:
-
spark.yarn.access.hadoopFileSystems
: Enter the location where your data is stored. -
spark.jars
: Update the Hive Warehouse Connector.jar
file, if necessary.
from pyspark.sql import SparkSession spark = SparkSession\ .builder\ .appName("CDW-CML-Spark-Direct")\ .config("spark.sql.hive.hwc.execution.mode","spark")\ .config("spark.sql.extensions","com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension")\ .config("spark.kryo.registrator","com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator")\ .config("spark.yarn.access.hadoopFileSystems","s3a://demo-aws-2/")\ .config("spark.jars", "/usr/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.2.2.0-244.jar")\ .getOrCreate() ### The following commands test the connection spark.sql("show databases").show() spark.sql("describe formatted test_managed").show() spark.sql("select * from test_managed").show() spark.sql("describe formatted test_external").show() spark.sql("select * from test_external").show()