HBase Snapshots enable you to take a snapshot of a table without much impact on Region Servers. Snapshot, clone, and restore operations don't involve data copying. In addition, exporting a snapshot to another cluster has no impact on Region Servers.
Prior to version 0.94.6, the only way to backup or clone a table was to use
CopyTable/ExportTable, or to copy all of the hfiles
in HDFS after disabling the
table. The disadvantage of these methods is that you can degrade Region Server
performance or you must disable the table, which means no reads or writes can
occur.
To turn on snapshot support, set the hbase.snapshot.enabled
property to true.
Note | |
---|---|
Snapshots are enabled by default in 0.95+, and are off by default in 0.94.6+. |
<property> <name>hbase.snapshot.enabled</name> <value>true</value> </property>
You can take a snapshot of a table regardless of whether it is enabled or disabled. The snapshot operation involves no data copying. As shown in the following example, start the HBase shell, and then clone the table:
$ hbase shell hbase> snapshot 'myTable', 'myTableSnapshot-122112'
List all snapshots taken by printing the names and relative information.
$ hbase shell hbase> list_snapshots
You can delete a snapshot. When a snapshot is deleted, all files associated with that snapshot are removed.
$ hbase shell hbase> delete_snapshot 'myTableSnapshot-122112'
You can create a new table from a snapshot with the clone operation. The cloned table contains the same data as the original table contained when the snapshot was taken. A change to the cloned table does not impact the snapshot or the original table.
$ hbase shell hbase> clone_snapshot 'myTableSnapshot-122112', 'myNewTestTable'
To restore a snapshot, the table must be disabled. After the restore operation, the table is restored to its state when the snapshot was taken, changing both data and schema if required.
$ hbase shell hbase> disable 'myTable' hbase> restore_snapshot 'myTableSnapshot-122112'
Note | |
---|---|
Because Replication works at the log level and snapshots work at the file system level, after a restore, the replicas are in a different state than the master. If you want to use restore, you need to stop replication and redo the bootstrap. In case of partial data loss due to client issues, you can clone the table from the snapshot and use a Map-Reduce job to copy the data that you need from the clone to the original table. This is preferred to using a full snapshot restore operation that requires the table to be disabled. |
If you are using security with the AccessController Coprocessor, only a global administrator can take, clone, or restore a snapshot. None of these actions capture ACL rights. Restoring a table preserves the ACL rights of the existing table, while cloning a table creates a new table that has no ACL rights until the administrator adds them.
The ExportSnapshot tool copies all the hfiles
, logs, and metadata related to a snapshot
to another cluster. The tool executes a Map-Reduce job, which is similar to distcp
, to copy files between
the two clusters. Because it works at the file system level, the HBase cluster does not have to be online. The HBase
Snapshot Export tool must be run as the hbase
user. The HBase Snapshot Export tool uses the temp directory
location that is specified by the hbase.tmp.dir
property (for example, /grid/0/var/log/hbase
), created on HDFS with
hbase
user as the owner.
The following example copies a snapshot called MySnapshot
to an HBase cluster srv2
(hdfs://srv2:8020/hbase
) using 16 mappers:
$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs://yourserver:8020/hbase_root_dir -mappers 16