Connecting to Iceberg tables
CML supports data connections to Iceberg data lakes.
You can set up a manual connection using the provided snippet example. To connect with Iceberg, you must use Spark 3.
Make sure to set the correct DATALAKE_DIRECTORY
environmental variable.
spark = (
SparkSession
.builder
.appName("Iceberg Spark")
.config("spark.jars", "/opt/spark/optional-lib/iceberg-spark-runtime.jar,/opt/spark/optional-lib/iceberg-hive-runtime.jar")
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
.config("spark.sql.catalog.spark_catalog.type", "hive")
.config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog")
.config("spark.hadoop.iceberg.engine.hive.enabled", "true")
.config("spark.yarn.access.hadoopFileSystems", <DATALAKE_DIRECTORY>)
.getOrCreate()
)