HBase Replication

HBase replication provides a means of copying the data from one HBase cluster to another (typically distant) HBase cluster. HBase replication is designed for data recovery rather than failover.

The cluster receiving the data from user applications is called the master cluster, and the cluster receiving the replicated data from the master is called the slave cluster.

Types of Replication

You can implement any of the following replication models:

In all cases, the principle of replication is similar to that of MySQL master-slave replication in which each transaction on the master cluster is replayed on the slave cluster. In the case of HBase, the Write-Ahead Log (WAL) or HLog records all the transactions (Put/Delete) and the master cluster RegionServers ship the edits to the slave cluster RegionServers. This is done asynchronously, so having the slave cluster in a distant data center does not cause high latency at the master cluster.

Master-Slave Replication

This is the basic replication model, in which transactions on the master cluster are replayed on the slave cluster, as described above. For instructions on configuring master-slave replications, see Deploying HBase Replication.

Master-Master Replication

In this case, the slave cluster in one relationship can act as the master in a second relationship, and the slave in the second relationship can act as master in a third relationship, and so on.

Cyclic Replication

In the cyclic replication model, the slave cluster acts as master cluster for the original master. This sort of replication is useful when both the clusters are receiving data from different sources and you want each of these clusters to have the same data.

Points to Note about Replication

You make the configuration changes on the master cluster side.
In the case of master-master replication, you make the changes on both sides.
Replication works at the table-column-family level. The family should exist on all the slaves. (You can have additional, non replicating families on both sides).
The timestamps of the replicated HLog entries are kept intact. In case of a collision (two entries identical as to row key, column family, column qualifier, and timestamp) only the entry arriving later write will be read.
Increment Column Values (ICVs) are treated as simple puts when they are replicated. In the master-master case, this may be undesirable, creating identical counters that overwrite one another. (See https://issues.apache.org/jira/browse/HBase-2804.)
Make sure the master and slave clusters are time-synchronized with each other. Cloudera recommends you use Network Time Protocol (NTP).

Requirements

Before configuring replication, make sure your environment meets the following requirements:

You must manage ZooKeeper yourself. It must not be managed by HBase, and must be available throughout the deployment.
Each host in both clusters must be able to reach every other host, including those in the ZooKeeper cluster.
Both clusters must be running the same major version of CDH; for example CDH 4 or CDH 5.
Every table that contains families that are scoped for replication must exist on each cluster and have exactly the same name.
HBase version 0.92 or greater is required for multiple slaves, master-master, or cyclic replication..

Deploying HBase Replication

Follow these steps to enable replication from one cluster to another.

Edit ${HBASE_HOME}/conf/hbase-site.xml on both clusters and add the following:

<property>
   <name>hbase.replication</name>
   <value>true</value>
</property>

Push hbase-site.xml to all nodes.
Restart HBase if it was running.
Run the following command in the HBase master's shell while it's running:
```
add_peer
```
This will show you the help for setting up the replication stream between the clusters. The command takes the form:
```
add_peer '<n>', "slave.zookeeper.quorum:zookeeper.clientport.:zookeeper.znode.parent"
```
where <n> is the peer ID; it should not be more than two characters (longer IDs may work, but have not been tested).

Example:
```
hbase> add_peer '1', "zk.server.com:2181:/hbase" 
```
Note:
If both clusters use the same Zookeeper cluster, you need to use a different zookeeper.znode.parent for each so that they don't write to the same directory.
Once you have a peer, enable replication on your column families. One way to do this is to alter the table and set the scope like this:
```
disable 'your_table'
alter 'your_table', {NAME => 'family_name', REPLICATION_SCOPE => '1'}
enable 'your_table'
```
Currently, a scope of 0 (default) means that the data will not be replicated and a scope of 1 means that it will. This could change in the future.
To list all configured peers, run the following command in the master's shell:
```
list_peers
```

You can confirm that your setup works by looking at any Region Server's log on the master cluster; look for the lines such as the following:

Considering 1 rs, with ratio 0.1
Getting 1 rs from peer cluster # 0
Choosing peer 170.22.64.15:62020

This indicates that one Region Server from the slave cluster was chosen for replication.

Deploying Master-Master or Cyclic Replication

For master-master or cyclic replication, repeat the above steps on each master cluster: add the hbase.replication property and set it to true, push the resulting hbase-site.xml to all nodes of this master cluster, use add_peer to add the slave cluster, and enable replication on the column families.

Guidelines for Replication across Three or More Clusters

When configuring replication among three or more clusters, follow these guidelines:

The replication relationships should have either :
- No loops (as in A->B->C, or A->B->C->D, etc. ), or
- If a loop exists, it should be a complete cycle (as in A->B->C->A, or A->B->C->D->A, etc.)
Cloudera recommends you enable KEEP_DELETED_CELLS on column families in the slave cluster, where REPLICATION_SCOPE=1in the master cluster; for example:
- On the master:
```
create 't1',{NAME=>'f1', REPLICATION_SCOPE=>1}
```
- On the slave:
```
create 't1',{NAME=>'f1', KEEP_DELETED_CELLS=>'true'}
```

Disabling Replication at the Peer Level

Use the command disable_peer ("<peerID>") to disable replication for a specific peer. This will stop replication to the peer, but the logs will be kept for future reference.

To re-enable the peer, use the command enable_peer(<"peerID">). Replication resumes where it was stopped.

Examples:

To disable peer 1:

disable_peer("1")

To re-enable peer 1:

enable_peer("1")

Stopping Replication in an Emergency

If replication is causing serious problems, you can stop it while the clusters are running.

To stop replication in an emergency:

Open the shell on the master cluster and use the stop_replication command. For example:

hbase(main):001:0> stop_replication

Already queued edits will be replicated after you use the stop_replication command, but new entries will not.

To start replication again, use the start_replication command.

Initiating Replication of Pre-existing Data

You may need to start replication from some point in the past. For example, suppose you have a primary HBase cluster in one location and are setting up a disaster-recovery (DR) cluster in another. To initialize the DR cluster, you need to copy over the existing data from the primary to the DR cluster, so that when you need to switch to the DR cluster you have a full copy of the data generated by the primary cluster. Once that is done, replication of new data can proceed as normal.

To start replication from an earlier point in time, run a copyTable command (defining the start and end timestamps), while enabling replication. Proceed as follows:

Start replication and note the timestamp.
Run the copyTable command with an end timestamp equal to the timestamp you noted in the previous step.
Note: Because replication starts from the current WAL, some key values may be copied to the slave cluster by both the replication and the copyTable job. This is not a problem because this is an idempotent operation (one that can be applied multiple times without changing the result).

Replicating Pre-existing Data in a Master-Master Deployment

In the case of master-master replication, run the copyTable job before starting the replication. (If you start the job after enabling replication, the second master will re-send the data to the first master, because copyTable does not edit the clusterId in the mutation objects. Proceed as follows:

Run the copyTable job and note the start timestamp of the job.
Start replication.
Run the copyTable job again with a start time equal to the start time you noted in step 1.

This results in some data being pushed back and forth between the two clusters; but it minimizes the amount of data.

Verifying Replicated Data

If you are looking only at a few rows, you can verify the replicated data in the shell.

For a systematic comparison on a larger scale, use the VerifyReplication MapReduce job. Run it on the master cluster and provide it with the peer ID (the one you provided when establishing the replication stream), a table name, and a column family. Other options allow you to specify a time range and specific families. This job's short name is verifyrep; provide that name when pointing hadoop jar to the HBase JAR file.

The command has the following form:

hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication [--starttime=timestamp1] [--stoptime=timestamp [--families=comma separated list of families] <peerId> <tablename>

The command prints out GOODROWS and BADROWS counters; these correspond to replicated and non-replicated rows respectively.

Configuring Secure HBase Replication

If you want to make HBase Replication secure, follow the instructions under HBase Authentication.

Caveats

Two variables govern replication: hbase.replication as described above under Deploying HBase Replication, and a replication znode. Stopping replication (using stop_replication as above) sets the znode to false. Two problems can result:
- If you add a new RegionServer to the master cluster while replication is stopped, its current log will not be added to the replication queue, because the replication znode is still set to false. If you restart replication at this point (using start_replication), entries in the log will not be replicated.
- Similarly, if a logs rolls on an existing RegionServer on the master cluster while replication is stopped, the new log will not be replicated, because the replication znode was set to false when the new log was created.
Loop detection is not guaranteed in all cases if you use cyclic replication among more than two clusters. Follow these guidelines.
In the case of a long-running, write-intensive workload, the slave cluster may become unresponsive if its meta-handlers are blocked while performing the replication. CDH 5 provides three properties to deal with this problem:
- hbase.regionserver.replication.handler.count - the number of replication handlers in the slave cluster (default is 3). Replication is now handled by separate handlers in the slave cluster to avoid the above-mentioned sluggishness. Increase it to a high value if the ratio of master to slave RegionServers is high.
- replication.sink.client.retries.number - the number of times the HBase replication client at the sink cluster should retry writing the WAL entries (default is 1).
- replication.sink.client.ops.timeout - the timeout for the HBase replication client at the sink cluster (default is 20 seconds).
For namespaces, tables, column families, or cells with associated ACLs, the ACLs themselves are not replicated. The ACLs need to be re-created manually on the target table. This behavior opens up the possibility for the ACLs could be different in the source and destination cluster.

Designating a Replication Source

HDFS Replication