Migrate Data from CDH or HDP to CDP Private Cloud Base
Before you migrate your data, you must have an Apache HBase cluster created on CDP Data Center. Your CDH or HDP cluster is your source cluster, and your CDP Private Cloud Base cluster is your destination cluster.
-
Deploy HBase replication on both the source and the destination cluster.
For instructions, see Deploy HBase replication.
-
Enable replication on both the source and destination clusters by running the
following commands in the HBase Shell.
On the source cluster
create 't1',{NAME=>'f1', REPLICATION_SCOPE=>1}
On the destination cluster
create 't1',{NAME=>'f1', KEEP_DELETED_CELLS=>'true'}
-
Run the
add_peer
command in the HBase Shell on the source cluster to add the destination cluster as a peer.add_peer 'ID', 'DESTINATION_CLUSTER_KEY'
You can get the
DESTINATION_CLUSTER_KEY
value from the HBase Master user interface that you can access using Cloudera Manager. -
Run the
disable_peer ("<peerID>")
command in the HBase Shell on the source cluster to disable the peer in the source clusterdisable_peer("ID1")
This stop the replication with the peer, but the logs are retained for future reference. -
Take a snapshot in Cloudera Manager.
- Select the HBase service.
- Click the Table Browser tab.
- Click a table.
- Click Take Snapshot.
- Specify the name of the snapshot, and click Take Snapshot.
-
Run the
ExportSnapshot
command in the HBase Shell on the source cluster to export a snapshot from the source to the destination cluster. You must run the ExportSnapshot command as thehbase
user or the user that owns the files.The
ExportSnapshot
tool executes a MapReduce Job similar todistcp
to copy files to the other cluster.ExportSnapshot
works at the file-system level, so the HBase cluster can be offline.hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot <snapshot name> -copy-to hdfs://destination:hdfs_port/hbase -mappers 16
Here, destination (hdfs://destination:hdfs_port/hbase) is the destination CDP Private Cloud Base cluster. Replace the HDFS server path and port with the ones you have used for your cluster. -
Run the in the HBase Shell on the source cluster to enable the peer in the
source and destination
clusters.
Run this command in the HBase Shell on the source cluster to enable the peer in the source and destination clusters
enable_peer("ID1")
-
Run the HashTable command on the source cluster and the SyncTable command on
the destination cluster to synchronize the table data between your source and
destination clusters.
On the source cluster
HashTable [options] <tablename> <outputpath>
On the destination cluster
SyncTable [options] <sourcehashdir> <sourcetable> <targettable>
For more information and examples about using HashTable and SyncTable, see Use HashTable and SyncTable tool.