Notes about replication

You must consider a few things before using HBase replication.

  • The timestamps of the replicated HLog entries are kept intact. In case of a collision (two entries identical as to row key, column family, column qualifier, and timestamp) only the entry arriving later write will be read.
  • Increment Column Values (ICVs) are treated as simple puts when they are replicated. In the case where each side of replication is active (new data originates from both sources, which then replicate each other), this may be undesirable, creating identical counters that overwrite one another. Therefore Cloudera recommends that different cluster do not write increments to the same coordinates. For more information about this issue, see https://issues.apache.org/jira/browse/HBase-2804.
  • Make sure the source and destination clusters are time-synchronized with each other. Cloudera recommends you use Network Time Protocol (NTP).
  • Some changes are not replicated and must be propagated through other means, such as Snapshots or CopyTable.

    • Data that existed in the active cluster before replication was enabled.

    • Operations that bypass the WAL, such as when using BulkLoad or API calls such as writeToWal(false).

    • Table schema modifications.