Hadoop High Availability
Also available as:
PDF
loading table of contents...

Managing and Configuring HBase Cluster Replication

Process Overview

  1. Configure and start the source and destination clusters. Create tables with the same names and column families on both the source and destination clusters, so that the destination cluster knows where to store the data that it receives.

  2. All hosts in the source and destination clusters should be reachable to each other.

  3. If both clusters use the same ZooKeeper cluster, you must use a different zookeeper.znode.parent, because they cannot write in the same folder.

  4. Check to be sure that replication has not been disabled. The hbase.replication setting defaults to true.

  5. On the source cluster, in HBase shell, add the destination cluster as a peer, using the add_peer command.

  6. On the source cluster, in HBase shell, enable the table replication, using the enable_table_replication command.

  7. Check the logs to see if replication is taking place. If so, you see messages like the following, coming from the Replication Source:

    LOG.info("Replicating "+ClusterId + " -> " + peerClusterId);

Table 4.1. HBase Cluster Management Commands

Command

Description

add_peer <ID> <CLUSTER_KEY>

Adds a replication relationship between two clusters:

  • ID: A unique string, which must not contain a hyphen.

  • CLUSTER_KEY: Composed using the following format:

    hbase.zookeeper.quorum:hbase.zookeeper. property.clientPort:zookeeper.znode.parent

list_peers

Lists all replication relationships known by the cluster.

enable_peer <ID>

Enables a previously-disabled replication relationship.

disable_peer <ID>

Disables a replication relationship. After disabling, HBase no longer sends edits to that peer cluster, but continues to track the new WALs that are required for replication to commence again if it is re-enabled.

remove_peer <ID>

Disables and removes a replication relationship. After removal, HBase no longer sends edits to that peer cluster nor does it track WALs.

enable_table_replication <TABLE_NAME>

Enables the table replication switch for all of the column families associated with that table. If the table is not found in the destination cluster, one is created with the same name and column families.

disable_table_replication <TABLE_NAME>

Disables the table replication switch for all of the column families associated with that table.