Prerequisites for HDFS lineage extraction
Before you start the extraction mechanism ensure you review the activities that need to be performed in Cloudera Manager.
You must enable the gateway role in Cloudera Manager. The following properties are visible in the Apache Atlas configuration.
atlas.hdfs.lineage.blacklist.paths
atlas.hdfs.lineage.whitelist.paths
For example:
atlas.hdfs.lineage.blacklist.paths=/tmp/blacklist/./tmp/whitelist/blacklist/
atlas.hdfs.lineage.whitelist.paths=/tmp/whitelist/
Adding additional configuration
Before you commence the HDFS extraction, you must include the following properties in your Atlas setup to ensure that the HDFS entities reflect at the Atlas endpoint.
-
Set the following to
atlas-application.properties
manually (NOT in Cloudera Manager) using the path:/etc/atlas/conf/atlas-application.properties
atlas.jaas.KafkaClient.option.keyTab
atlas.jaas.KafkaClient.option.principal
If you do not add these properties, the HDFS lineage entities do not reflect at the Atlas endpoint.