About HBase snapshots

HBase snapshots allow you to clone a table without making data copies, and with minimal impact on RegionServers. Exporting the table to another cluster does not have any impact on the RegionServers.

In previous HBase releases, the only way to a back up or to clone a table was to use CopyTable or ExportTable, or to copy all the hfiles in HDFS after disabling the table. These methods have disadvantages:

  • CopyTable and ExportTable can degrade RegionServer performance.
  • Disabling the table means no reads or writes; this is usually unacceptable.

HBase snapshots allow you to clone a table without making data copies, and with minimal impact on RegionServers. Exporting the table to another cluster does not have any impact on the RegionServers.

Use Cases

  • Recovery from user or application errors
    • Useful because it may be some time before the database administrator notices the error.
    • The database administrator may want to save a snapshot before a major application upgrade or change.
    • Recovery cases:
      • Roll back to previous snapshot and merge in reverted data.
      • View previous snapshots and selectively merge them into production.
  • Backup
    • Capture a copy of the database and store it outside HBase for disaster recovery.
    • Capture previous versions of data for compliance, regulation, and archiving.
    • Export from a snapshot on a live system provides a more consistent view of HBase than CopyTable and ExportTable.
  • Audit or report view of data at a specific time
    • Capture monthly data for compliance.
    • Use for end-of-day/month/quarter reports.
  • Application testing
    • Test schema or application changes on similar production data from a snapshot and then discard.
      For example:
      1. Take a snapshot.
      2. Create a new table from the snapshot content (schema and data)
      3. Manipulate the new table by changing the schema, adding and removing rows, and so on. The original table, the snapshot, and the new table remain independent of each other.
  • Offload work
    • Capture, copy, and restore data to another site
    • Export data to another cluster

Where Snapshots Are Stored

Snapshot metadata is stored in the .hbase_snapshot directory under the hbase root directory (/hbase/.hbase-snapshot). Each snapshot has its own directory that includes all the references to the hfiles, logs, and metadata needed to restore the table.

hfiles required by the snapshot are in the /hbase/data/<namespace>/<tableName>/<regionName>/<familyName>/ location if the table is still using them; otherwise, they are in /hbase/.archive/<namespace>/<tableName>/<regionName>/<familyName>/.

Zero-Copy Restore and Clone Table

From a snapshot, you can create a new table (clone operation) or restore the original table. These two operations do not involve data copies; instead, a link is created to point to the original hfiles.

Changes to a cloned or restored table do not affect the snapshot or (in case of a clone) the original table.

To clone a table to another cluster, you export the snapshot to the other cluster and then run the clone operation; see Exporting a snapshot to another cluster.

Reverting to a Previous HBase Version

Snapshots do not affect HBase backward compatibility if they are not used.

If you use snapshots, backward compatibility is affected as follows:

  • If you only take snapshots, you can still revert to a previous HBase version.
  • If you use restore or clone, you cannot revert to a previous version unless the cloned or restored tables have no links. Links cannot be detected automatically; you would need to inspect the file system manually.

Storage Considerations

Because hfiles are immutable, a snapshot consists of a reference to the files in the table at the moment the snapshot is taken. No copies of the data are made during the snapshot operation, but copies may be made when a compaction or deletion is triggered. In this case, if a snapshot has a reference to the files to be removed, the files are moved to an archive folder, instead of being deleted. This allows the snapshot to be restored in full.

Because no copies are performed, multiple snapshots share the same hfiles, butfor tables with lots of updates, and compactions, each snapshot could have a different set of hfiles.