Verify that replication works

Confirm data has been replicated from a source cluster to a remote destination cluster.

  1. Install and configure YARN on the source cluster.

    If YARN cannot be used in the source cluster, configure YARN on the destination cluster to verify replication.

    If neither the source nor the destination clusters can have YARN installed, you can configure the tool to use local mode; however, performance and consistency could be negatively impacted.

  2. Ensure that you have the required permissions:
    • You have sudo permissions to run commands as the hbase user, or a user with admin permissions on both clusters.
    • You are an hbase user configured for submitting jobs with YARN.
  3. Run the VerifyReplication command:
    src-node$ sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication peer1 table1
    ...
            org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters
                    BADROWS=2
                    CONTENT_DIFFERENT_ROWS=1
                    GOODROWS=1
                    ONLY_IN_PEER_TABLE_ROWS=1
            File Input Format Counters
                    Bytes Read=0
            File Output Format Counters
                    Bytes Written=0
    The following table describes the VerifyReplication counters:
    Table 1. VerifyReplication Counters
    Counter Description
    GOODROWS Number of rows. On both clusters, and all values are the same.
    CONTENT_DIFFERENT_ROWS The key is the same on both source and destination clusters for a row, but the value differs.
    ONLY_IN_SOURCE_TABLE_ROWS Rows that are only present in the source cluster but not in the destination cluster.
    ONLY_IN_PEER_TABLE_ROWS Rows that are only present in the destination cluster but not in the source cluster.
    BADROWS Total number of rows that differ from the source and destination clusters; the sum of CONTENT_DIFFERENT_ROWS + ONLY_IN_SOURCE_TABLE_ROWS + ONLY_IN_PEER_TABLE_ROWS

    By default, VerifyReplication compares the entire content of table1 on the source cluster against table1 on the destination cluster that is configured to use the replication peer peer1.

    Use the following options to define the period of time, versions, or column families
    Table 2. VerifyReplication Counters
    Option Description
    --starttime=<timestamp> Beginning of the time range, in milliseconds. Time range is forever if no end time is defined.
    --endtime=<timestamp> End of the time range, in milliseconds.
    --versions=<versions> Number of cell versions to verify.
    --families=<cf1,cf2,..> Families to copy; separated by commas.
    The following example, verifies replication only for rows with a timestamp range of one day:
    src-node$ sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --starttime=1472499077000 --endtime=1472585477000 --families=c1  peer1 table1