Configuring Catalog
When using Spark SQL to query an Iceberg table from Spark, you refer to a table using the following dot notation:
<catalog_name>.<database_name>.<table_name>
The default catalog used by Spark is named spark_catalog
.
When referring to a table in a database known to spark_catalog
, you can
omit <catalog_name>.
SparkCatalog
property that understands Iceberg
tables, and a SparkSessionCatalog
property that understands both Iceberg
and non-Iceberg tables. In CDE, the following are configured by default
:spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type=hive
This replaces
Spark’s default catalog by Iceberg’s SparkSessionCatalog
and allows you
to use both Iceberg and non-Iceberg tables out of the box.
There is one caveat when
using SparkSessionCatalog
. Iceberg supports CREATE TABLE … AS
SELECT
(CTAS) and REPLACE TABLE … AS SELECT
(RTAS) as atomic
operations when using SparkCatalog
. Whereas, the CTAS and RTAS are
supported but are not atomic when using SparkSessionCatalog
. As a
workaround, you can configure another catalog that uses SparkCatalog
. For
example, to create the catalog named iceberg_catalog
, set the following:
spark.sql.catalog.iceberg_catalog=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.iceberg_catalog.type=hive