Verifying Replicated HBase Data
The VerifyReplication
MapReduce job, which is
included in HBase, performs a systematic comparison of replicated
data between two different clusters. Run the VerifyReplication job
on the master cluster, supplying it with the peer ID and table name
to use for validation. You can limit the verification further by
specifying a time range or specific column families. The job short
name is verifyrep
. To run the job, use a command
like the following:
$ HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` "${HADOOP_HOME}/bin/hadoop" jar "${HBASE_HOME}/hbase-server-VERSION.jar" verifyrep --starttime=<timestamp> --stoptime=<timestamp> --families=<myFam> <ID> <tableName>
The VerifyReplication
command prints out
GOODROWS
and BADROWS
counters to indicate rows that did and did not replicate
correctly.