Deploy HBase replication
Once you have source and destination clusters, you can enable and configure HBase replication to use it as your data import method.
- Configure and start the source and destination clusters.
- Create tables with the same names and column families on both the source and destination clusters, so that the destination cluster knows where to store data it receives. All hosts in the source and destination clusters have to be reachable to each other. see Create empty table on the destination cluster.
- 
                On the source cluster, enable replication in Cloudera Manager, or by setting
                        hbase.replicationtotruein hbase-site.xml.
- 
                Obtain Kerberos credentials as the HBase principal. Substitute your
                        fully.qualified.domain.nameand realm in the following command:kinit -k -t /etc/hbase/conf/hbase.keytab hbase/fully.qualified.domain.name@YOUR-REALM.COM
- 
                On the source cluster, in HBase Shell, add the destination cluster as a peer,
                    using the add_peercommand.add_peer 'ID', 'CLUSTER_KEY'To compose the CLUSTER_KEY, use the following template:hbase.zookeeper.quorum:hbase.zookeeper.property.clientPort:zookeeper.znode.parentFor example:host1.com,host2.com,host3.com:2181:/hbase
- 
                On the source cluster, configure each column family to be replicated by setting
                    its REPLICATION_SCOPEto 1, using commands such as the following in HBase Shell:hbase> disable 'example_table' hbase> alter 'example_table', {NAME => 'example_family', REPLICATION_SCOPE => '1'} hbase> enable 'example_table'
- 
                Verify that replication is occurring by examining the logs on the source cluster for messages such as the following.
                Considering 1 rs, with ratio 0.1 Getting 1 rs from peer cluster # 0 Choosing peer 192.0.2.49:62020
- 
                To verify the validity of replicated data, use the included
                        VerifyReplicationMapReduce job on the source cluster, providing it with the ID of the replication peer and table name to verify. Other options are available, such as a time range or specific families to verify.The command has the following form:hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication [--starttime=timestamp1] [--stoptime=timestamp] [--families=comma separated list of families] <peerId> <tablename>TheVerifyReplicationcommand prints GOODROWS and BADROWS counters to indicate rows that did and did not replicate correctly.
