Migrating HBase from CDH or HDP

Before you migrate your data, you must have an Apache HBase cluster created on CDP Private Cloud Base. Your CDH or HDP cluster is your source cluster, and your CDP Private Cloud Base cluster is your destination cluster.

  1. Deploy HBase replication on both the source and the destination cluster.
    For instructions, see Deploy HBase replication.
  2. Enable replication on both the source and destination clusters by running the following commands in the HBase Shell.
    On the source cluster
    create 't1',{NAME=>'f1', REPLICATION_SCOPE=>1}

    On the destination cluster

    create 't1',{NAME=>'f1', KEEP_DELETED_CELLS=>'true'}
  3. Run the add_peer <ID>, <DESTINATION_CLUSTER_KEY> command in the HBase Shell on the source cluster to add the destination cluster as a peer.

    You can get the DESTINATION_CLUSTER_KEY value from the HBase Master user interface that you can access using Cloudera Manager.

  4. Monitor replication status and verify whether the queued writes on the source cluster are flowing to the destination.
    hbase shell
    hbase> status 'replication'
    hbase01.home:
    SOURCE: PeerID=1
    Normal Queue: 1
    AgeOfLastShippedOp=0, TimeStampOfLastShippedOp=Fri Jun 12 18:49:23 BST 2020, SizeOfLogQueue=1, EditsReadFromLogQueue=1, OpsShippedToTarget=1, TimeStampOfNextToReplicate=Fri Jun 12 18:49:23 BST 2020, Replication Lag=0
    SINK: TimeStampStarted=1591983663458, AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Fri Jun 12 18:57:18 BST 2020

    The Replication Lag=0 demonstrates that the data is replicated to the destination cluster successfully.

    For more information, see Monitoring Replication Status.

  5. Run the disable_peer <ID1> command in the HBase Shell on the source cluster to disable the peer in the source cluster
    This stop the replication with the peer, but the logs are retained for future reference.
  6. Take a snapshot in Cloudera Manager.
    1. Select the HBase service.
    2. Click the Table Browser tab.
    3. Click a table.
    4. Click Take Snapshot.
    5. Specify the name of the snapshot, and click Take Snapshot.
    To migrate HBase data from HDP, run the following sample HBase shell command to generate HBase table snapshot.
    hbase shell
    hbase> snapshot 'myTable', 'myTableSnapshot-122112'

    You must run this command for each table that you want to migrate.

  7. Run the ExportSnapshot command in the HBase Shell on the source cluster to export a snapshot from the source to the destination cluster. You must run the ExportSnapshot command as the hbase user or the user that owns the files.

    The ExportSnapshot tool executes a MapReduce Job similar to distcp to copy files to the other cluster. ExportSnapshot works at the file-system level, so the HBase cluster can be offline.

    hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot <snapshot name> -copy-to hdfs://destination:hdfs_port/hbase -mappers 16
    Here, destination (hdfs://destination:hdfs_port/hbase) is the destination CDP Private Cloud Base cluster. Replace the HDFS server path and port with the ones you have used for your cluster.
  8. Run the following commands in the HBase Shell on the destination cluster to restore the table. If the table already exists on the destination cluster, use restore_snapshot to restore it to the state recorded in the snapshot.
    disable <tablename>
    restore_snapshot <snapshotname>

    If the snapshot still does not exist, use the clone_snapshot command to recreate it.

    clone_snapshot <snapshotname> <tablename>
  9. Run the enable_peer <ID1> command in the HBase Shell on the source cluster to enable the peer in the source and destination clusters.
  10. Run the HashTable command on the source cluster and the SyncTable command on the destination cluster to synchronize the table data between your source and destination clusters.
    On the source cluster
    HashTable [options] <tablename> <outputpath>

    On the destination cluster

    SyncTable [options] <sourcehashdir> <sourcetable> <targettable>

    For more information and examples about using HashTable and SyncTable, see Verifying and validating if your data is migrated.