Creating an Ozone data connection

Cloudera Machine Learning (CML) supports data connections to Ozone file systems.

You can set up a manual connection using the provided snippet example. To connect to Ozone, you must use Spark 3.

Set the following parameters:

  • DATALAKE_DIRECTORY
  • Valid database and table name in the describe formatted SQL command.
from pyspark.sql import SparkSession
# Change to the appropriate Datalake directory location
DATALAKE_DIRECTORY = "s3a://your-aws-demo/"

spark = (
  SparkSession.builder.appName("MyApp")
  .config("spark.jars", "/opt/ozone-addon/jar/ozone-filesystem-hadoop3.jar")
  .config("spark.yarn.access.hadoopFileSystems", DATALAKE_DIRECTORY)
  .getOrCreate()
  )

spark.sql("show databases").show() 
spark.sql("describe formatted <database_name>.<table_name>").show()