About HBase snapshots
HBase snapshots allow you to clone a table without making data copies, and with minimal impact on RegionServers. Exporting the table to another cluster has little impact on the RegionServers.
Without
using snapshots, the only way to backup or clone a table was to use the provided
CopyTable
or ExportTable
tools. However, these methods
have disadvantages:
-
CopyTable
andExportTable
can degrade RegionServer performance because they read the table data. - Disabling the table means no reads or writes; this is usually unacceptable.
Use Cases
- Data migration
- You can use HBase snapshots to migrate data from CDH 5.x or HDP 2.x to COD on CDP Public Cloud. Refer to the migration guide for more information.
- Recovery from user or application errors
- Useful because it may be some time before the database administrator notices the error.
- The database administrator may want to save a snapshot before a major application upgrade or change.
- Recovery cases:
- Roll back to the previous snapshot and merge in reverted data.
- View previous snapshots and choose which snapshot to apply into production.
- Backup
- Capture a copy of the database and store it outside HBase for disaster recovery.
- Capture previous versions of data for compliance, regulation, and archiving.
- Export from a snapshot on a live system provides a faster and simpler way of
moving HBase data than the
CopyTable
andExportTable
.
- Audit or report view of data at a specific time
- Capture monthly data for compliance.
- Use for end-of-day/month/quarter reports.
- Application testing
- Test schema or application changes on similar production
data from a snapshot and then discard.For example:
- Take a snapshot.
- Create a new table from the snapshot content (schema and data)
- Manipulate the new table by changing the schema, adding and removing rows, and so on. The original table, the snapshot, and the new table remain independent of each other.
- Test schema or application changes on similar production
data from a snapshot and then discard.
- Offload work
- Capture, copy, and restore data to another site
- Export data to another cluster
Zero-Copy Restore and Clone Table
From a snapshot, you can create a new table
(clone_snapshot
) or restore the original table
(restore_snapshot
). When cloning snapshots, a new table is created
to reflect the state of the original table when the snapshot was taken. Restoring
snapshots, however, reverts the original table state to that of when the snapshot was
taken (any updates to the table after the snapshot was taken would be lost). These two
operations do not involve data copies; instead, a link is created to point to the
original hfiles
.
Changes to a cloned or restored table do not affect the snapshot.
To clone a table to another cluster, you export the snapshot to the
other cluster and then run the clone
operation; see Exporting a
snapshot to another cluster.
Storage Considerations
Because hfiles
are immutable, a snapshot consists
of a reference to the files in the table at the moment the snapshot
is taken. No copies of the data are made during the snapshot
operation, but copies may be made when a compaction or deletion is
triggered. In this case, if a snapshot has a reference to the files
to be removed, the files are moved to an archive folder, instead of
being deleted. This allows the snapshot to be restored in full.
When a snapshot is deleted, all retained HFiles will be automatically deleted when the files are no longer needed.