Use CopyTable
CopyTable
uses HBase read and write paths to copy part or all of a
table to a new table in either the same cluster or a different cluster.
CopyTable
causes read load when reading from the source, and write load when
writing to the destination. Region splits occur on the destination table in real time as
needed. To avoid these issues, use snapshot
and export
commands instead of CopyTable
. Alternatively, you can pre-split the
destination table to avoid excessive splits. The destination table can be partitioned
differently from the source table. See this section of the Apache HBase documentation for
more information.
CopyTable
starts
are not copied, so you may need to do an additional
CopyTable
operation to copy new data into the
destination table. Run CopyTable
as follows, using
--help
to see details about possible
parameters.$ ./bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --help
Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>
The starttime
/endtime
and
startrow
/endrow
pairs function in a similar way: if you leave
out the first of the pair, the first timestamp or row in the table is the starting point.
Similarly, if you leave out the second of the pair, the operation continues until the end of the
table. To copy the table to a new table in the same cluster, you must specify
--new.name
, unless you want to write the copy back to the same table, which
would add a new version of each cell (with the same data), or just overwrite the cell with the
same value if the maximum number of versions is set to 1. To copy the table to a new table in a
different cluster, specify --peer.adr
and optionally, specify a new table
name.
echo create 'NewTestTable', 'cf1', 'cf2', 'cf3' | bin/hbase shell --non-interactive
bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --families=cf1,cf2,cf3 --new.name=NewTestTable TestTable
Snapshots are recommended instead of CopyTable for most situations.