About HBase snapshots

HBase snapshots allow you to clone a table without making data copies, and with minimal impact on RegionServers. Exporting the table to another cluster has little impact on the RegionServers.

Without using snapshots, the only way to backup or clone a table was to use the provided CopyTable or ExportTable tools. However, these methods have disadvantages:

  • CopyTable and ExportTable can degrade RegionServer performance because they read the table data.
  • Disabling the table means no reads or writes; this is usually unacceptable.

Use Cases

  • Data migration
    • You can use HBase snapshots to migrate data from CDH 5.x or HDP 2.x to COD on CDP Public Cloud. Refer to the migration guide for more information.
  • Recovery from user or application errors
    • Useful because it may be some time before the database administrator notices the error.
    • The database administrator may want to save a snapshot before a major application upgrade or change.
    • Recovery cases:
      • Roll back to the previous snapshot and merge in reverted data.
      • View previous snapshots and choose which snapshot to apply into production.
  • Backup
    • Capture a copy of the database and store it outside HBase for disaster recovery.
    • Capture previous versions of data for compliance, regulation, and archiving.
    • Export from a snapshot on a live system provides a faster and simpler way of moving HBase data than the CopyTable and ExportTable.
  • Audit or report view of data at a specific time
    • Capture monthly data for compliance.
    • Use for end-of-day/month/quarter reports.
  • Application testing
    • Test schema or application changes on similar production data from a snapshot and then discard.
      For example:
      1. Take a snapshot.
      2. Create a new table from the snapshot content (schema and data)
      3. Manipulate the new table by changing the schema, adding and removing rows, and so on. The original table, the snapshot, and the new table remain independent of each other.
  • Offload work
    • Capture, copy, and restore data to another site
    • Export data to another cluster

Zero-Copy Restore and Clone Table

From a snapshot, you can create a new table (clone_snapshot) or restore the original table (restore_snapshot). When cloning snapshots, a new table is created to reflect the state of the original table when the snapshot was taken. Restoring snapshots, however, reverts the original table state to that of when the snapshot was taken (any updates to the table after the snapshot was taken would be lost). These two operations do not involve data copies; instead, a link is created to point to the original hfiles.

Changes to a cloned or restored table do not affect the snapshot.

To clone a table to another cluster, you export the snapshot to the other cluster and then run the clone operation; see Exporting a snapshot to another cluster.

Storage Considerations

Because hfiles are immutable, a snapshot consists of a reference to the files in the table at the moment the snapshot is taken. No copies of the data are made during the snapshot operation, but copies may be made when a compaction or deletion is triggered. In this case, if a snapshot has a reference to the files to be removed, the files are moved to an archive folder, instead of being deleted. This allows the snapshot to be restored in full.

When a snapshot is deleted, all retained HFiles will be automatically deleted when the files are no longer needed.