Use Direct Reader Mode with PySpark

Make sure to update the following parameters in the code sample below:

spark.yarn.access.hadoopFileSystems: Enter the location where your data is stored.
spark.jars: Update the Hive Warehouse Connector .jar file, if necessary.

from pyspark.sql import SparkSession
                
spark = SparkSession\
.builder\
.appName("CDW-CML-Spark-Direct")\
.config("spark.sql.hive.hwc.execution.mode","spark")\
.config("spark.sql.extensions","com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension")\
.config("spark.kryo.registrator","com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator")\
.config("spark.yarn.access.hadoopFileSystems","s3a://demo-aws-2/")\
.config("spark.jars", "/usr/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.2.2.0-244.jar")\
.getOrCreate()

### The following commands test the connection

spark.sql("show databases").show()
spark.sql("describe formatted test_managed").show()
spark.sql("select * from test_managed").show()
spark.sql("describe formatted test_external").show()
spark.sql("select * from test_external").show()

Use Direct Reader Mode with PySpark

We want your opinion

How can we improve this page?