Use case: Applying data lineage for tracing a data propagation error
Learn about how a small error can cause a large-scale impact in complex, interconnected systems. Comprehensive data lineage can act as both a detective and a roadmap for resolving such issues.
Scenario overview
Consider a hypothetical large healthcare organization.
The organization operates a core Electronic Health Records (EHR) system that stores critical patient information. This system feeds multiple downstream platforms, including billing, patient portals, insurance claims, and other operational tools.
During a routine upgrade, a coding error introduces a transformation issue in the EHR
system. The patient_ID column, which uniquely identifies patients, is
incorrectly mapped to the caregiver_ID column that identifies healthcare
providers.
The domino effect
As the EHR system feeds data to downstream systems, the mapping error has the following impacts:
- In the billing system, patients are billed for services provided to their healthcare providers.
- The patient portals show the healthcare providers details instead of patient-specific information, leading to privacy breaches.
- Insurance claims get denied due to incorrect patient information.
- Patients, healthcare providers, insurance companies, and even the healthcare organization customer service and IT teams are heavily impacted.
Prevention using an integrated platform
Having an integrated Cloudera Octopai Data Lineage platform that encompasses data discovery, lineage, and data catalog can significantly enhance the ability to prevent such errors by gaining greater visibility into data flows and transformations. This visibility improves detection and prevention of potential issues, enhancing overall data quality and reliability.
Different roles use the integrated platform to achieve the improvements by performing the following actions:
- Data Engineers trace the data lineage during system development and maintenance using the catalog as a reference to check transformation logic and mapping to avoid incorrect data flow. They follow robust testing protocols before and after deploying any changes in data processing or transformation logic.
- Data Stewards perform metadata management by ensuring that the platform metadata is up-to-date and accurate. They oversee data quality and leverage the lineage tools to verify proper data flows. They establish and monitor data quality rules aligned with the metadata and lineage information.
- Data Analysts understand the origin and transformations of data they use for reporting and analytics. This help them spot potential issues and validate the data they use.
- IT Security and Compliance understand where sensitive data is stored, transformed, and used. This enables them to enforce access control and monitoring to prevent unauthorized changes, promptly detect and respond to any suspicious activities, and ensure adherence to regulatory requirements.
This proactive approach is governed by a rigorous change management process, during which any system change requires an impact analysis supported by data lineage. Automated data quality checks are also used to identify out-of-range values or unusual data distributions that can indicate issues.
