Connecting to Iceberg tables
Cloudera AI supports data connections to Iceberg data lakes.
You can set up a manual connection using the provided snippet example. To connect with Iceberg, you must use Spark 3.
Make sure to set the correct DATALAKE_DIRECTORY
environmental variable.
spark = (
SparkSession.builder.appName("MyApp")
.config("spark.sql.hive.hwc.execution.mode", "spark")
.config("spark.sql.extensions","com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
.config("spark.sql.catalog.spark_catalog.type", "hive")
.config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkCatalog")
.config("spark.kerberos.access.hadoopFileSystems","hdfs://nn1.com:8032,hdfs://nn2.com:8032,webhdfs://nn3.com:50070")
.config("spark.hadoop.iceberg.engine.hive.enabled", "true")
.config("spark.executorEnv.HADOOP_CONF_DIR", "/home/cdsw/hadoop_config_dir")
.config("spark.sql.iceberg.handle-timestamp-without-timezone", "true")
.config("spark.jars","/opt/spark/optional-11b/iceberg-spark-runtime.jar,/opt/spark/optional-11b/iceberg-hive-runtime.jar")
.config("spark.driver.userClassPathFirst", "true")
.config("spark.executor.userClassPathFirst", "true")
.config("spark.yarn.user.classpath.first", "true")
.getOrCreate()
)