Verify that replication works
Confirm data has been replicated from a source cluster to a remote destination cluster.
-
Install and configure YARN on the source cluster.
If YARN cannot be used in the source cluster, configure YARN on the destination cluster to verify replication.
If neither the source nor the destination clusters can have YARN installed, you can configure the tool to use local mode; however, performance and consistency could be negatively impacted.
-
Ensure that you have the required permissions:
- You have sudo permissions to run commands as the hbase user, or a user with admin permissions on both clusters.
- You are an hbase user configured for submitting jobs with YARN.
-
Run the
VerifyReplication
command:src-node$ sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication peer1 table1 ... org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters BADROWS=2 CONTENT_DIFFERENT_ROWS=1 GOODROWS=1 ONLY_IN_PEER_TABLE_ROWS=1 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0
The following table describes theVerifyReplication
counters:Table 1. VerifyReplication Counters Counter Description GOODROWS
Number of rows. On both clusters, and all values are the same. CONTENT_DIFFERENT_ROWS
The key is the same on both source and destination clusters for a row, but the value differs. ONLY_IN_SOURCE_TABLE_ROWS
Rows that are only present in the source cluster but not in the destination cluster. ONLY_IN_PEER_TABLE_ROWS
Rows that are only present in the destination cluster but not in the source cluster. BADROWS
Total number of rows that differ from the source and destination clusters; the sum of CONTENT_DIFFERENT_ROWS
+ONLY_IN_SOURCE_TABLE_ROWS
+ONLY_IN_PEER_TABLE_ROWS
By default,
VerifyReplication
compares the entire content oftable1
on the source cluster againsttable1
on the destination cluster that is configured to use the replication peerpeer1
.Use the following options to define the period of time, versions, or column familiesTable 2. VerifyReplication Counters Option Description --starttime=<timestamp>
Beginning of the time range, in milliseconds. Time range is forever if no end time is defined. --endtime=<timestamp>
End of the time range, in milliseconds. --versions=<versions>
Number of cell versions to verify. --families=<cf1,cf2,..>
Families to copy; separated by commas. The following example, verifies replication only for rows with a timestamp range of one day:src-node$ sudo -u hbase hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --starttime=1472499077000 --endtime=1472585477000 --families=c1 peer1 table1