Storage container reconciliation overview

Learn how storage container reconciliation identifies corrupted or diverged container replicas and repairs them to ensure consistency and identical data across all replicas.

Overview

Generally, container replicas are identical in nature and contain the same data. Container reconciliation is a repair mechanism that allows administrators to detect container replicas that are corrupted or have otherwise diverged from each other for any reason, and repair these replicas, making them identical again. The data within each replica is summarized as a single data checksum. When the data within the replicas diverges, the replicas will have different data checksums. After reconciling the replicas, they will each contain the same data and each replica's data checksums will match the others. This data checksum is designed to remain constant even as data is deleted from the container in the background.

Restrictions and limitations

  • Reconciliation is not supported for containers that are in the OPEN, DELETING, or DELETED states. Reconciling containers in these states fails immediately and makes no modifications. Reconciliation can be used to repair data in UNHEALTHY replicas, where it updates the replicas' data and corresponding checksums, but not their state.
  • Reconciliation is currently supported for Ratis-replicated containers only. Reconciling an EC container fails immediately and makes no modifications.
  • Containers with a replication factor of 1 cannot be reconciled because there are no peer replicas to compare against. Similarly, containers with a replication factor of 3 that have zero or one replica available cannot be reconciled.
  • After running ozone admin container reconcile on a CLOSED RATIS/THREE container, checksums can match, but the replica state will not change. even if it was UNHEALTHY. If the container is corrupted again later and the checksums diverge once more, you can run the ozone admin container reconcile command multiple times until the container is repaired.

    Storage Container Manager (SCM) can only use replication to automatically recover from failures. SCM will not automatically schedule reconciliation. Since replication requires at least one source replica not in the UNHEALTHY state, reconciliation will need to be run manually if all replicas are UNHEALTHY.