HashTable/SyncTable tool configuration
You can configure the HashTable/SyncTable tool for your specific needs.
Using the batchsize option
You can define the amount of cell data for a given region that is hashed together in a single
    hash value using the batchsize option, which sets the batchsize property.
    Sizing this property has a direct impact on the synchronization efficiency. If the batch size is
    increased, larger chunks are hashed.
If only a few differences are expected between the two tables, using a bit larger batch size can be beneficial, as less scans are executed by mapper tasks of SyncTable.
However, if relatively frequent differences are expected between the tables, using a large batch size can cause frequent mismatches of hash values, as the probability of finding at least one mismatch in a batch is increased.
$ hbase org.apache.hadoop.hbase.mapreduce.HashTable --batchsize=32000 --numhashfiles=50 --starttime=1265875194289 --endtime=1265878794289 --families=cf2,cf3 TestTableA /hashes/testTableCreating a read-only report
You can use the dryrun option in the second, SyncTable, step to create a read
    only report. It produces only COUNTERS indicating the differences between the two tables, but
    does not perform any actual changes. It can be used as an alternative of the VerifyReplication
    tool.
$ hbase org.apache.hadoop.hbase.mapreduce.SyncTable --dryrun=true --sourcezkcluster=zk1.example.com,zk2.example.com,zk3.example.com:2181:/hbase hdfs://nn:8020/hashes/testTable testTableA testTableBHandling missing cells
By default, SyncTable changes the target table to make it an exact copy of the source table, at least for the specified startrow-stoprow or starttime-endtime.
Setting doDeletes to false modifies the default behaviour to
    not delete target cells that are missing on the source table. Similarly, setting
     doPuts to false  modifies the default behaviour to not add
    missing cells to target table. If you set both doDeletes and
     doPuts to false, the result will be the same as setting
     dryrun to true.
In the case of two-way replication or other scenarios where both source and target clusters
    can have data ingested, Cloudera recommends to set doDeletes to
     false. Otherwise any additional cells inserted on the SyncTable target cluster
    and not yet replicated to the source cluster would be deleted, and potentially lost
    permanently.
