Spark Connector configuration in Apache Atlas

Learn to configure the Spark Atlas Connector so that Spark jobs can run even when Kafka brokers are down. This ensures that your job submissions do not fail.

Before you begin the configuration, you must enable the following properties in Cloudera Manager:

Atlas Service:
  1. Log in to Cloudera Manager.
  2. Select the Spark service.
  3. Select the Configurations tab.
  4. Search for and select the Atlas Service (atlas_service) parameter.
  5. Click Save Changes.
Spark Lineage:
  1. Log in to Cloudera Manager.
  2. Select the Spark service.
  3. Select the Configurations tab.
  4. Search for and select the Spark Lineage (spark.lineage.enabled) parameter.
  5. Click Save Changes.
Using the spark.lineage.kafka.fault-tolerant.timeout.ms parameter

If all Kafka brokers are down, the deploy mode is cluster mode without the keytab/principal. Spark Atlas lineage is also enabled which does not allow a delegation token from the Kafka broker. Therefore, the job submission fails. As a result, --deploy-mode is set but --principal PRINCIPAL and --keytab KEYTAB are not set. Use spark.lineage.kafka.fault-tolerant.timeout.ms parameter to resolve this case.

The default timeout value for the Kafka delegation token creation is 0. This means that the fault tolerant mode is disabled. If the value is greater than 0, the fault tolerant mode is enabled with the configured timeout value.

See the example PySpark code below:
from pyspark.sql import SparkSession
import time
import sys
id_string = str(time.time()).split(".")[0]
spark = SparkSession.builder.appName("Python CTAS example").getOrCreate()
spark.sql("CREATE TABLE parquet_spark1_" + id_string +" ( id bigint, data string, category string) using parquet")
spark.sql("CREATE TABLE parquet_spark2_" + id_string +" USING parquet AS SELECT * from parquet_spark1_" + id_string)
Example commands:
spark-submit --conf spark.lineage.kafka.fault-tolerant.timeout.ms=8000 --deploy-mode cluster test.py
spark3-submit --conf spark.lineage.kafka.fault-tolerant.timeout.ms=8000 --deploy-mode cluster test.py
Limitation:

This timeout will permit the job submission in cluster mode if all Kafka Brokers are down. There will be no Spark Atlas Connector lineage because we do not have Kafka delegation token which is required for the lineage creation.