Change management - Best practices for using Cloudera Octopai for CI/CD DataOps

As data developers and data governance teams, learn about how to effectively use the Cloudera Octopai Data Lineage automated data lineage solution in managing changes to data flows within a CI/CD DataOps process through best practices for environment setup, impact analysis, automated refreshes, pre-production testing, and troubleshooting to ensure data integrity and governance.

Your environments

For data developers, establishing multiple environments is crucial to manage changes efficiently. This setup allows you to rigorously test changes before they reach production, while also aligning with data governance protocols. The best practice is to effectively use the following environments within Cloudera Octopai:
  • Development Environment – In this environment, data developers introduce and iterate on changes, ensuring that the changes meet initial requirements.
  • QA Environment – In this environment, data governance teams can validate changes against governance policies and standards by executing test plans to ensure compliance and data integrity.
  • Staging Environment – In this environment, data developers and governance teams can mimic production and perform final validation.
  • Production Environment – In this live environment, end-users and business processes actively use data with strict governance oversight.

Applying impact analysis and risk assessment

Understanding the potential impact of changes on data flows is critical for both data developers and governance teams by performing the following actions:

  1. Trigger impact analysis.
    • Use Cloudera Octopai to identify upstream and downstream dependencies that might be affected by the change.
    • Ensure that the impact analysis aligns with governance policies, documenting any risks or compliance issues.
  2. Conduct risk assessment.
    • Evaluate the technical risks associated with the change, such as potential disruptions to dependent systems.
    • Assess the risks from a compliance perspective, ensuring that all regulatory requirements are met.
Figure 1. Upstream impact analysis
Figure 2. Comparison upstream impact analysis between QA and production

Automating data refresh for testing

Automating the data refresh process in Cloudera Octopai ensures that all environments reflect the most recent changes, which is essential for both effective development and governance. To automate data refresh, consider the following recommendations:
  • Use Jenkins. For teams using Jenkins in their CI/CD pipeline, Cloudera Octopai can be integrated to automate the data refresh process. This allows data developers to work with the latest data lineage and governance teams to ensure continuous compliance.
  • Use Cloudera Octopai Client built-in feature for scheduling automatic data refreshes. This ensures that the lineage is always up-to-date, facilitating both development and governance efforts.
  • Apply Jenkins in environments with established CI/CD pipelines where rigorous testing and governance are required. If not, the Cloudera Octopai scheduling features can serve as a simpler alternative.
  • Apply the Cloudera Octopai automatic refresh solution on a connection level that is metadata source level.
For more information, see https://docs.cloudera.com/octopai/latest/getting-started/topics/oct-admin-user-octopai-client.html.

Simulating changes in pre-production

Simulating changes in a pre-production environment is essential to avoid unintended consequences in production, particularly when governance standards must be met. To simulate changes, perform the following actions:
  • Simulate impact.
    • Use Cloudera Octopai to simulate the change and understand its technical implications across environments.
    • Ensure that the simulated impact is analysed for compliance risks, validating that the change adheres to governance policies.
  • Create a regression test plan.
    • Develop a test plan that covers all critical data flows impacted by the change.
    • Validate that the test plan includes checks for compliance and data integrity.

Use the Export of Cloudera Octopai E2E Column Lineage capability as a foundation for your test plan.

Figure 3. Export of Cloudera Octopai E2E Column Lineage

Troubleshooting in production

When a change leads to an issue in production, Cloudera Octopai helps both data developers and governance teams quickly identify and resolve the problem.

Use lineage information for troubleshooting by performing the following actions:
  • Trace the issue back to its source within the data flow, identifying the root cause.
  • Ensure that the resolution process aligns with governance standards, updating documentation as needed.
Document the solution by performing the following actions:
  • Document the technical steps taken to resolve the issue.
  • Update governance documentation to reflect the resolution and any changes to compliance processes.