Use case: Applying data lineage for data quality

Learn about the power of comprehensive data lineage with the ability to trace data at column level, within a system, and across systems.

With the help of comprehensive data lineage an organization can quickly identify and resolve data issues, improving overall data quality and trust for effective Data Change Management.

A Data Analyst named Alex in a large financial institution faces a data quality issue.

Alex is tasked with creating a Report to analyze customer transactions using a column named transaction_amount from a central data warehouse.

As he starts the planned change investigation, Alex notices some irregularities in the data. Negative values occur in the transaction_amount column, which is incorrect for this context. Alex begins investigating using the following different layers of data lineage:

  1. End-to-end column lineage – Alex starts by investigating the end-to-end lineage of the transaction_amount column. He sees various transformations applied to this data point, and where the data point is used in downstream reports. He discovers that the column is derived from the transaction_type (credit or debit) and transaction_value columns in a source system. A transformation is applied to convert debit transaction values to negative.
  2. Inner-system lineage – Alex then looks at the inner-system lineage within the source system. He notices that the transaction_type column is derived from several fields, including the transaction_code field. A particular transaction code identifies debit transactions, and an error in mapping this code might cause the issue.
  3. Cross-system lineage – Finally, Alex uses the cross-system lineage to find all systems feeding into the transaction_type column. He discovers an upstream system where the transaction_code field originates. A recent system update changed the transaction_code values for debit transactions.

Armed with this information, Alex collaborates with the data engineering team to correct the error in mapping the new transaction_codefield and ensures that the transformation logic applied to the transaction_amount column is accurate. As a result, the data quality issue is resolved, and Alex can confidently proceed with his report, trusting the data he is using.