Migrate Data from CDH or HDP to CDP Private Cloud Base

Before you migrate your data, you must have an Apache HBase cluster created on CDP Data Center. Your CDH or HDP cluster is your source cluster, and your CDP Private Cloud Base cluster is your destination cluster.

  1. Deploy HBase replication on both the source and the destination cluster.
    For instructions, see Deploy HBase replication.
  2. Enable replication on both the source and destination clusters by running the following commands in the HBase Shell.
    On the source cluster
    create 't1',{NAME=>'f1', REPLICATION_SCOPE=>1}

    On the destination cluster

    create 't1',{NAME=>'f1', KEEP_DELETED_CELLS=>'true'}
  3. Run the add_peer command in the HBase Shell on the source cluster to add the destination cluster as a peer.
    add_peer 'ID', 'DESTINATION_CLUSTER_KEY'

    You can get the DESTINATION_CLUSTER_KEY value from the HBase Master user interface that you can access using Cloudera Manager.

  4. Run the disable_peer ("<peerID>") command in the HBase Shell on the source cluster to disable the peer in the source cluster
    disable_peer("ID1")
    This stop the replication with the peer, but the logs are retained for future reference.
  5. Take a snapshot in Cloudera Manager.
    1. Select the HBase service.
    2. Click the Table Browser tab.
    3. Click a table.
    4. Click Take Snapshot.
    5. Specify the name of the snapshot, and click Take Snapshot.
  6. Run the ExportSnapshot command in the HBase Shell on the source cluster to export a snapshot from the source to the destination cluster. You must run the ExportSnapshot command as the hbase user or the user that owns the files.

    The ExportSnapshot tool executes a MapReduce Job similar to distcp to copy files to the other cluster. ExportSnapshot works at the file-system level, so the HBase cluster can be offline.

    hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot <snapshot name> -copy-to hdfs://destination:hdfs_port/hbase -mappers 16
    Here, destination (hdfs://destination:hdfs_port/hbase) is the destination CDP Private Cloud Base cluster. Replace the HDFS server path and port with the ones you have used for your cluster.
  7. Run the in the HBase Shell on the source cluster to enable the peer in the source and destination clusters.
    Run this command in the HBase Shell on the source cluster to enable the peer in the source and destination clusters
    enable_peer("ID1")
  8. Run the HashTable command on the source cluster and the SyncTable command on the destination cluster to synchronize the table data between your source and destination clusters.
    On the source cluster
    HashTable [options] <tablename> <outputpath>

    On the destination cluster

    SyncTable [options] <sourcehashdir> <sourcetable> <targettable>

    For more information and examples about using HashTable and SyncTable, see Use HashTable and SyncTable tool.