Support for On-Demand lineage

The on-demand lineage provides improved end user experience to handle data flow and related entities.

Currently, the lineage works with minimal basic data to return all the connected entities for any single request. The Atlas UI consumes additional time to load when the lineage response is larger than usual and also when the lineage is multi-level, the cost of processing the data could be higher.

To handle these conditions, the concept of on-demand lineage is introduced. Using the on-demand lineage, you can provide more control to the user for depth and height of specific lineage, apart from increasing the number of nodes from three (which is default with three input and output nodes respectively) and finally displaying the maximum number of nodes. On-demand lineage also processes limited lineage data that helps in the optimization of the UI.

In hindsight, the lineage data is broken down into parts and when the user clicks on the specific “expandable” button in the lineage, only then the lineage shows up fully (or as the lineage data is broken) to provide additional information.

You can set the following parameters to use the on-demand lineage:

  • atlas.lineage.on.demand.default.enabled = true
  • atlas.lineage.on.demand.default.node.count = 3 (Default value is 3)
  • atlas.lineage.max.node.count = 5k (Atlas supports upto 9k nodes)

Note the following:

  • If you pass an inputRelationsLimit / outputRelationsLimit in the payload, the entered values are applicable only for that specific GUID, and for other nodes Atlas considers the default value as input/output Relations Limit.

  • If you do not provide any limits or if you have 0 as the limit, Atlas considers default configurations.

  • Order nodes in lineage are based on how the nodes are stored in JanusGraph.

The lineage data is retrieved from JanusGraph (retrieving the edges and vertices). The order in which the data availability is provided solely depends on the order JanusGraph stores and returns vertices. And the order of vertices in lineage is generally consistent all the time. In other words, every time a request is initiated for lineage data, the result is in the same order and the lineage on-demand feature is designed based on the consistent ordering as available in the JanusGraph database.

  • You can update the node count value using the Settings > Nodes On Demand option in Atlas UI. This, apart from updating the same using Cloudera Manager.