How Lineage strategy works

To illustrate the difference between lineage strategies, consider a sample NiFi flow.

With 'Simple Path', Atlas lineage is reported as seen in the following image, when '/tmp/input/A1.csv' is selected.

Because 'Simple Path' maps I/O events to a 'nifi_flow_path', '/tmp/output/B1.csv' is shown in the lineage graph as the file that is written by the 'GetFile, PutFile...' process.

With 'Complete Path', Atlas lineage is reported as seen in the following image. In this use case, 'GetFile, PutFile...' process is not linked to '/tmp/output/B1.csv'.

The 'Complete Path' strategy creates two different 'nifi_flow_path' entities. One for '/tmp/input/A1.csv -> /tmp/output/A1.csv' and another for '/tmp/input/B1.csv -> /tmp/output/B1.csv'.

However, after the data records ingest from A.csv and B.csv into a bigger dataset, 'nifi-test' Kafka topic in the following example (or whatever dataset such as a database table or a concatenated file ... and so on), record level lineage tracking about where it came from can no longer be tracked.

The resulting '/tmp/consumed/B_2..' is shown in the same lineage graph, although the file does not contain any data that came from '/tmp/input/A1.csv'.