Expire snapshots feature

You can expire snapshots that Iceberg generates when you create or modify a table. During the lifetime of a table the number of snapshots of the table accumulate. You learn how to remove snapshots you no longer need.

You should periodically expire snapshots to delete data files that are no longer needed, and to reduce the size of table metadata. Each write to an Iceberg table from Hive creates a new snapshot, or version, of a table. Snapshots accumulate until expired.

You can expire snapshots based on the following conditions:
  • All snapshots older than a timestamp or timestamp expression
  • A snapshot having a given ID
  • Snapshots having IDs matching a given list of IDs
  • Snapshots within the range of two timestamps

You can keep snapshots you are likely to need, for example recent snapshots, and expire old snapshots. For example, you can keep daily snapshots for the last 30 days, then weekly snapshots for the past year, then monthly snapshots for the last 10 years. You can remove specific snapshots to meet GDPR “right to be forgotten” requirements.

Hive or Impala syntax

ALTER TABLE <table Name> EXECUTE EXPIRE_SNAPSHOTS(<timestamp expression>)

ALTER TABLE <table Name> EXECUTE EXPIRE_SNAPSHOTS('<Snapshot Id>')

ALTER TABLE <table Name> EXECUTE EXPIRE_SNAPSHOTS('<Snapshot Id1>,<Snapshot Id2>... ')

ALTER TABLE <table Name> EXECUTE EXPIRE_SNAPSHOTS BETWEEN (<timestamp expression>) AND (<timestamp expression>) 

Hive or Impala example

The first example removes snapshots having a timestamp older than August 15, 2022 11:00 am. The second example removes snapshots from 10 days ago and before.

ALTER TABLE ice_11 EXECUTE EXPIRE_SNAPSHOTS('2022-11-04 13:50:00');
ALTER TABLE ice_t EXECUTE EXPIRE_SNAPSHOTS(now() - interval 10 days);
ALTER TABLE test_table EXECUTE expire_snapshots('2021-12-09 05:39:18.689000000');    

Preventing snapshot expiration

You can prevent expiration of recent snapshots by configuring the history.expire.min-snapshots-to-keep table property. You can use the alter table feature to set a property. The history.expire.min-snapshots-to-keep property refers to a number of snapshots, not a time delta. For example, assume you always want to keep all snapshots of your table for the last 24 hours. You configure history.expire.min-snapshots-to-keep as a safety mechanism to enforce this. If your table receives only one modification (insert / update / merge) per hour, then setting history.expire.min-snapshots-to-keep = 24 is sufficient to meet your requirement. However, if your table was consistently receiving updates every minute, then the last 24 hour period would entail 1440 snapshots, and the history.expire.min-snapshots-to-keep setting would need to be configured appropriately.

Table data and orphan maintenance

The contents of the table directory (actual data) might, or might not, be removed when you drop the table. An orphan data file can remain when you drop an Iceberg table, depending on the external.table.purge flag table property. An orphaned data file is one that has contents in the table directory, but no snapshot.

Expiring a snapshot does not remove old metadata files by default. You must clean up metadata files using write.metadata.delete-after-commit.enabled=true and write.metadata.previous-versions-max table properties. For more information, see "Iceberg table properties" below. Setting this property controls automatic metadata file removal after metadata operations, such as expiring snapshots or inserting data.