Spark troubleshooting

What do you do if you don't see Atlas metadata from Spark?

Spark runs an Atlas "hook" or plugin called Spark Atlas Connector (SAC) on every host where Spark runs. To troubleshoot problems, consider the following methods for narrowing down where the problem is:

  • Are you missing all metadata?

    Make sure that all the services supporting Atlas are configured and running. For CDP, the configuration is done for you; look in Cloudera Manager to see that Kafka, Solr, and Atlas services are running in the Data Lake.

  • Are you missing all Spark process metadata?

    By default, Spark operations are configured to send metadata to Atlas. To check that these settings have not been rolled back, look at the Spark On YARN service configuration page in Cloudera Manager to ensure that Spark is configured to send metadata to Atlas (Atlas Service property). Assuming this configuration is enabled, you can next check the Kafka topic queue to make sure that metadata messages are being produced in Spark and making it to the Kafka topic.

  • Missing only some Spark metadata?

    Because each instance of Spark collects metadata independently of other instances, it is possible that one instance failed to send metadata to Atlas. To determine if this is the problem, check the Kafka topic queue to see if one of the Spark hosts is not sending metadata.