Prerequisites for HDFS lineage extraction

Before you start the extraction mechanism ensure you review the activities that need to be performed in Cloudera Manager.

You must enable the gateway role in Cloudera Manager. The following properties are visible in the Atlas configuration.

  • atlas.hdfs.lineage.blacklist.paths
  • atlas.hdfs.lineage.whitelist.paths

For example:

atlas.hdfs.lineage.blacklist.paths=/tmp/blacklist/./tmp/whitelist/blacklist/

atlas.hdfs.lineage.whitelist.paths=/tmp/whitelist/

Adding additional configuration

Before you commence the HDFS extraction, you must include the following properties in your Atlas setup to ensure that the HDFS entities reflect at the Atlas endpoint.

  • Set the following to atlas-application.properties manually (NOT in Cloudera Manager) using the path: /etc/atlas/conf/atlas-application.properties
    • atlas.jaas.KafkaClient.option.keyTab
    • atlas.jaas.KafkaClient.option.principal

If you do not add these properties, the HDFS lineage entities do not reflect at the Atlas endpoint.