Verifying Replicated HBase Data
The VerifyReplication
MapReduce job, which is included in HBase,
performs a systematic comparison of replicated data between two different clusters. Run the
VerifyReplication job on the master cluster, supplying it with the peer ID and table name to
use for validation. You can limit the verification further by specifying a time range or
specific column families. The job short name is verifyrep
. To run the
job, use a command like the following:
$ HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` "${HADOOP_HOME}/bin/hadoop" jar "${HBASE_HOME}/hbase-server-VERSION.jar" verifyrep --starttime=<timestamp> --stoptime=<timestamp> --families=<myFam> <ID> <tableName>
The VerifyReplication
command prints out GOODROWS
and BADROWS
counters to indicate rows that did and did not replicate
correctly.