Migrating HBase from CDH or HDP
Before you migrate your data, you must have an Apache HBase cluster created on CDP Private Cloud Base. Your CDH or HDP cluster is your source cluster, and your CDP Private Cloud Base cluster is your destination cluster.
-
Deploy HBase replication on both the source and the destination cluster.
For instructions, see Deploy HBase replication.
-
Enable replication on both the source and destination clusters by running the
following commands in the HBase Shell.
On the source cluster
create 't1',{NAME=>'f1', REPLICATION_SCOPE=>1}
On the destination cluster
create 't1',{NAME=>'f1', KEEP_DELETED_CELLS=>'true'}
-
Run the
add_peer <ID>, <DESTINATION_CLUSTER_KEY>
command in the HBase Shell on the source cluster to add the destination cluster as a peer.You can get the
DESTINATION_CLUSTER_KEY
value from the HBase Master user interface that you can access using Cloudera Manager. -
Monitor replication status and verify whether the queued writes on the source
cluster are flowing to the destination.
hbase shell hbase> status 'replication' hbase01.home: SOURCE: PeerID=1 Normal Queue: 1 AgeOfLastShippedOp=0, TimeStampOfLastShippedOp=Fri Jun 12 18:49:23 BST 2020, SizeOfLogQueue=1, EditsReadFromLogQueue=1, OpsShippedToTarget=1, TimeStampOfNextToReplicate=Fri Jun 12 18:49:23 BST 2020, Replication Lag=0 SINK: TimeStampStarted=1591983663458, AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Fri Jun 12 18:57:18 BST 2020
The
Replication Lag=0
demonstrates that the data is replicated to the destination cluster successfully.For more information, see Monitoring Replication Status.
-
Run the
disable_peer <ID1>
command in the HBase Shell on the source cluster to disable the peer in the source clusterThis stop the replication with the peer, but the logs are retained for future reference. -
Take a snapshot in Cloudera Manager.
- Select the HBase service.
- Click the Table Browser tab.
- Click a table.
- Click Take Snapshot.
- Specify the name of the snapshot, and click Take Snapshot.
To migrate HBase data from HDP, run the following sample HBase shell command to generate HBase table snapshot.hbase shell hbase> snapshot 'myTable', 'myTableSnapshot-122112'
You must run this command for each table that you want to migrate.
-
Run the
ExportSnapshot
command in the HBase Shell on the source cluster to export a snapshot from the source to the destination cluster. You must run theExportSnapshot
command as thehbase
user or the user that owns the files.The
ExportSnapshot
tool executes a MapReduce Job similar todistcp
to copy files to the other cluster.ExportSnapshot
works at the file-system level, so the HBase cluster can be offline.hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot <snapshot name> -copy-to hdfs://destination:hdfs_port/hbase -mappers 16
Here, destination (hdfs://destination:hdfs_port/hbase) is the destination CDP Private Cloud Base cluster. Replace the HDFS server path and port with the ones you have used for your cluster. -
Run the following commands in the HBase Shell on the destination cluster to
restore the table. If the table already exists on the destination cluster, use
restore_snapshot
to restore it to the state recorded in the snapshot.disable <tablename> restore_snapshot <snapshotname>
If the snapshot still does not exist, use the
clone_snapshot
command to recreate it.clone_snapshot <snapshotname> <tablename>
-
Run the
enable_peer <ID1>
command in the HBase Shell on the source cluster to enable the peer in the source and destination clusters. -
Run the
HashTable
command on the source cluster and theSyncTable
command on the destination cluster to synchronize the table data between your source and destination clusters.On the source clusterHashTable [options] <tablename> <outputpath>
On the destination cluster
SyncTable [options] <sourcehashdir> <sourcetable> <targettable>
For more information and examples about using
HashTable
andSyncTable
, see Verifying and validating if your data is migrated.