Use snapshots
A snapshot captures the state of a table at the time the snapshot was taken.
Cloudera recommends snapshots instead of CopyTable where possible. Because no data is copied when a snapshot is taken, the process is very quick. As long as the snapshot exists, cells in the snapshot are never deleted from HBase, even if they are explicitly deleted by the API. Instead, they are archived so that the snapshot can restore the table to its state at the time of the snapshot.
You can export snapshots from CDH 5 to CDP Private Cloud Base , and from CDP Private Cloud Base to CDH 5, if the the version of CDP Private Cloud Base is 7.1 or higher.
After taking a snapshot, use the clone_snapshot
command to copy the data to a
new (immediately enabled) table in the same cluster, or the Export utility to create a new table
based on the snapshot, in the same cluster. This is a copy-on-write operation. The new table
shares HFiles with the original table until writes occur in the new table but not the old table,
or until a compaction or split occurs in either of the tables. This can improve performance in
the short term compared to CopyTable.
To export the snapshot to a new cluster, use the ExportSnapshot
utility, which uses MapReduce to copy the snapshot to the new cluster.
Run the ExportSnapshot
utility on the source cluster,
as a user with HBase and HDFS write permission on the destination
cluster, and HDFS read permission on the source cluster. This creates
the expected amount of IO load on the destination cluster. Optionally,
you can limit bandwidth consumption, which affects IO on the destination
cluster. After the ExportSnapshot operation completes, you can see the
snapshot in the new cluster using the list_snapshot
command, and you can use the clone_snapshot
command to
create the table in the new cluster from the snapshot.
For full instructions for the snapshot
and
clone_snapshot
HBase Shell commands, run the HBase
Shell and type help snapshot
. The following example
takes a snapshot of a table, uses it to clone the table to a new table
in the same cluster, and then uses the ExportSnapshot
utility to copy the table to a different cluster, with 16 mappers and
limited to 200 Mb/sec bandwidth.
$ bin/hbase shell
hbase(main):005:0> snapshot 'TestTable', 'TestTableSnapshot'
0 row(s) in 2.3290 seconds
hbase(main):006:0> clone_snapshot 'TestTableSnapshot', 'NewTestTable'
0 row(s) in 1.3270 seconds
hbase(main):007:0> describe 'NewTestTable'
DESCRIPTION ENABLED
'NewTestTable', {NAME => 'cf1', DATA_BLOCK_ENCODING true
=> 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE
=> '0', VERSIONS => '1', COMPRESSION => 'NONE', MI
N_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_C
ELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY =>
'false', BLOCKCACHE => 'true'}, {NAME => 'cf2', DA
TA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESS
ION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER
', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '655
36', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
1 row(s) in 0.1280 seconds
hbase(main):008:0> quit
$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot TestTableSnapshot -copy-to file:///tmp/hbase -mappers 16 -bandwidth 200
14/10/28 21:48:16 INFO snapshot.ExportSnapshot: Copy Snapshot Manifest
14/10/28 21:48:17 INFO client.RMProxy: Connecting to ResourceManager at a1221.example.com/192.0.2.121:8032
14/10/28 21:48:19 INFO snapshot.ExportSnapshot: Loading Snapshot 'TestTableSnapshot' hfile list
14/10/28 21:48:19 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
14/10/28 21:48:19 INFO util.FSVisitor: No logs under directory:hdfs://a1221.example.com:8020/hbase/.hbase-snapshot/TestTableSnapshot/WALs
14/10/28 21:48:20 INFO mapreduce.JobSubmitter: number of splits:0
14/10/28 21:48:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1414556809048_0001
14/10/28 21:48:20 INFO impl.YarnClientImpl: Submitted application application_1414556809048_0001
14/10/28 21:48:20 INFO mapreduce.Job: The url to track the job: http://a1221.example.com:8088/proxy/application_1414556809048_0001/
14/10/28 21:48:20 INFO mapreduce.Job: Running job: job_1414556809048_0001
14/10/28 21:48:36 INFO mapreduce.Job: Job job_1414556809048_0001 running in uber mode : false
14/10/28 21:48:36 INFO mapreduce.Job: map 0% reduce 0%
14/10/28 21:48:37 INFO mapreduce.Job: Job job_1414556809048_0001 completed successfully
14/10/28 21:48:37 INFO mapreduce.Job: Counters: 2
Job Counters
Total time spent by all maps in occupied slots (ms)=0
Total time spent by all reduces in occupied slots (ms)=0
14/10/28 21:48:37 INFO snapshot.ExportSnapshot: Finalize the Snapshot Export
14/10/28 21:48:37 INFO snapshot.ExportSnapshot: Verify snapshot integrity
14/10/28 21:48:37 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
14/10/28 21:48:37 INFO snapshot.ExportSnapshot: Export Completed: TestTableSnapshot
The url to track the job: contains the URL from which you can track the
ExportSnapshot
job. When it finishes, a new set of HFiles, comprising all of
the HFiles that were part of the table when the snapshot was taken, is created at the HDFS
location you specified.
You can use the SnapshotInfo
command-line utility included with
HBase to verify or debug snapshots.