Create an Ozone data connection
CML supports data connections to Ozone file systems.
You can set up a manual connection using the snippet provided below. To connect to Ozone, you must use Spark 3.
Make sure to set the following parameters:
DATALAKE_DIRECTORY
- Valid database and table name in the
describe formatted
SQL command.
from pyspark.sql import SparkSession
# Change to the appropriate Datalake directory location
DATALAKE_DIRECTORY = "s3a://your-aws-demo/"
spark = (
SparkSession.builder.appName("MyApp")
.config("spark.jars", "/opt/ozone-addon/jar/ozone-filesystem-hadoop3.jar")
.config("spark.yarn.access.hadoopFileSystems", DATALAKE_DIRECTORY)
.getOrCreate()
)
spark.sql("show databases").show()
spark.sql("describe formatted <database_name>.<table_name>").show()