Use case: Applying data lineage for data quality
Learn about the power of comprehensive data lineage with the ability to trace data at column level, within a system, and across systems.
With the help of comprehensive data lineage an organization can quickly identify and resolve data issues, improving overall data quality and trust for effective Data Change Management.
A Data Analyst named Alex in a large financial institution faces a data quality issue.
Alex is tasked with creating a Report to analyze customer transactions using a column named transaction_amount from a central data warehouse.
As he starts the planned change investigation, Alex notices some irregularities in the data.
Negative values occur in the transaction_amount column, which is incorrect
for this context. Alex begins investigating using the following different layers of data
lineage:
-
End-to-end column lineage – Alex starts by investigating the
end-to-end lineage of the
transaction_amountcolumn. He sees various transformations applied to this data point, and where the data point is used in downstream reports. He discovers that the column is derived from thetransaction_type(credit or debit) andtransaction_valuecolumns in a source system. A transformation is applied to convert debit transaction values to negative. -
Inner-system lineage – Alex then looks at the inner-system
lineage within the source system. He notices that the
transaction_typecolumn is derived from several fields, including thetransaction_codefield. A particular transaction code identifies debit transactions, and an error in mapping this code might cause the issue. -
Cross-system lineage – Finally, Alex uses the cross-system
lineage to find all systems feeding into the
transaction_typecolumn. He discovers an upstream system where thetransaction_codefield originates. A recent system update changed thetransaction_codevalues for debit transactions.
Armed with this information, Alex collaborates with the data engineering team to correct the
error in mapping the new transaction_codefield and ensures that the
transformation logic applied to the transaction_amount column is accurate. As
a result, the data quality issue is resolved, and Alex can confidently proceed with his
report, trusting the data he is using.
