Use CopyTable

CopyTable uses HBase read and write paths to copy part or all of a table to a new table in either the same cluster or a different cluster.

CopyTable causes read load when reading from the source, and write load when writing to the destination. Region splits occur on the destination table in real time as needed. To avoid these issues, use snapshot and export commands instead of CopyTable. Alternatively, you can pre-split the destination table to avoid excessive splits. The destination table can be partitioned differently from the source table. See this section of the Apache HBase documentation for more information.

Using CopyTable, you can copy a part or all of a table in a CDH 5 cluster to a given table in a cluster where the Cloudera Private Cloud Base version is 7.1 or higher. However, you have to pass the Dhbase.meta.replicas.use=true command in the command line to make it work. For example:

hbase org.apache.hadoop.hbase.mapreduce.CopyTable -Dhbase.meta.replicas.use=true ... <other options>

Using CopyTable to copy a part or all of a table from a Cloudera Private Cloud Base cluster to a CDH 5 cluster does not work.

Edits to the source table after the CopyTable starts are not copied, so you may need to do an additional CopyTable operation to copy new data into the destination table. Run CopyTable as follows, using --help to see details about possible parameters.

$ ./bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --help
    Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>

The starttime/endtime and startrow/endrow pairs function in a similar way: if you leave out the first of the pair, the first timestamp or row in the table is the starting point. Similarly, if you leave out the second of the pair, the operation continues until the end of the table. To copy the table to a new table in the same cluster, you must specify --new.name, unless you want to write the copy back to the same table, which would add a new version of each cell (with the same data), or just overwrite the cell with the same value if the maximum number of versions is set to 1. To copy the table to a new table in a different cluster, specify --peer.adr and optionally, specify a new table name.

The following example creates a new table using HBase Shell in non-interactive mode, and then copies data in two ColumnFamilies in rows starting with timestamp 1265875194289 and including the last row before the CopyTable started, to the new table.

echo create 'NewTestTable', 'cf1', 'cf2', 'cf3' | bin/hbase shell --non-interactive
    bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289  --families=cf1,cf2,cf3 --new.name=NewTestTable TestTable

Snapshots are recommended instead of CopyTable for most situations.