Spark connector configuration in Apache Atlas
Learn to configure the Spark Atlas Connector so that Spark jobs can run even when Kafka brokers are down. This ensures that your job submissions do not fail.
Before you begin the configuration, you must enable the following properties in Cloudera Manager:
- Log in to Cloudera Manager.
- Select the Spark service.
- Select the Configurations tab.
- Search for and select the Atlas Service (atlas_service) parameter.
- Click Save Changes.
- Log in to Cloudera Manager.
- Select the Spark service.
- Select the Configurations tab.
- Search for and select the Spark Lineage (spark.lineage.enabled) parameter.
- Click Save Changes.
- Using the spark.lineage.kafka.fault-tolerant.timeout.ms parameter
-
If all Kafka brokers are down, the deploy mode is cluster mode without the keytab/principal. Spark Atlas lineage is also enabled which does not allow a delegation token from the Kafka broker. Therefore, the job submission fails. As a result,
--deploy-mode
is set but--principal PRINCIPAL
and--keytab KEYTAB
are not set. Use spark.lineage.kafka.fault-tolerant.timeout.ms parameter to resolve this case.The default timeout value for the Kafka delegation token creation is 0. This means that the fault tolerant mode is disabled. If the value is greater than 0, the fault tolerant mode is enabled with the configured timeout value.