Verify that replication works
Confirm data has been replicated from a source cluster to a remote destination cluster.
-
Install and configure YARN on the source cluster.
If YARN cannot be used in the source cluster, configure YARN on the destination cluster to verify replication.
If neither the source nor the destination clusters can have YARN installed, you can configure the tool to use local mode; however, performance and consistency could be negatively impacted.
-
Ensure that you have the required permissions:
- You have sudo permissions to run commands as the hbase user, or a user with admin permissions on both clusters.
- You are an hbase user configured for submitting jobs with YARN.
-
Run the
VerifyReplicationcommand:src-node$ sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication peer1 table1 ... org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters BADROWS=2 CONTENT_DIFFERENT_ROWS=1 GOODROWS=1 ONLY_IN_PEER_TABLE_ROWS=1 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0The following table describes theVerifyReplicationcounters:Table 1. VerifyReplication Counters Counter Description GOODROWSNumber of rows. On both clusters, and all values are the same. CONTENT_DIFFERENT_ROWSThe key is the same on both source and destination clusters for a row, but the value differs. ONLY_IN_SOURCE_TABLE_ROWSRows that are only present in the source cluster but not in the destination cluster. ONLY_IN_PEER_TABLE_ROWSRows that are only present in the destination cluster but not in the source cluster. BADROWSTotal number of rows that differ from the source and destination clusters; the sum of CONTENT_DIFFERENT_ROWS+ONLY_IN_SOURCE_TABLE_ROWS+ONLY_IN_PEER_TABLE_ROWSBy default,
VerifyReplicationcompares the entire content oftable1on the source cluster againsttable1on the destination cluster that is configured to use the replication peerpeer1.Use the following options to define the period of time, versions, or column familiesTable 2. VerifyReplication Counters Option Description --starttime=<timestamp>Beginning of the time range, in milliseconds. Time range is forever if no end time is defined. --endtime=<timestamp>End of the time range, in milliseconds. --versions=<versions>Number of cell versions to verify. --families=<cf1,cf2,..>Families to copy; separated by commas. The following example, verifies replication only for rows with a timestamp range of one day:src-node$ sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --starttime=1472499077000 --endtime=1472585477000 --families=c1 peer1 table1
