Expire snapshots feature
You can expire snapshots that Iceberg generates when you create or modify a table. During the lifetime of a table the number of snapshots of the table accumulate. You learn how to remove snapshots you no longer need.
You should periodically expire snapshots to delete data files that are no longer needed, and to reduce the size of table metadata. Each write to an Iceberg table from Hive creates a new snapshot, or version, of a table. Snapshots accumulate until expired.
- All snapshots older than a timestamp or timestamp expression
- A snapshot having a given ID
- Snapshots having IDs matching a given list of IDs
- Snapshots within the range of two timestamps
You can keep snapshots you are likely to need, for example recent snapshots, and expire old snapshots. For example, you can keep daily snapshots for the last 30 days, then weekly snapshots for the past year, then monthly snapshots for the last 10 years. You can remove specific snapshots to meet GDPR “right to be forgotten” requirements.
Hive or Impala syntax
ALTER TABLE <table Name> EXECUTE EXPIRE_SNAPSHOTS(<timestamp expression>)
ALTER TABLE <table Name> EXECUTE EXPIRE_SNAPSHOTS('<Snapshot Id>')
ALTER TABLE <table Name> EXECUTE EXPIRE_SNAPSHOTS('<Snapshot Id1>,<Snapshot Id2>... ')
ALTER TABLE <table Name> EXECUTE EXPIRE_SNAPSHOTS BETWEEN (<timestamp expression>) AND (<timestamp expression>)
Hive or Impala example
The first example removes snapshots having a timestamp older than August 15, 2022 1:50 pm. The second example removes snapshots from 10 days ago and before.
ALTER TABLE ice_11 EXECUTE EXPIRE_SNAPSHOTS('2022-15-08 13:50:00');
ALTER TABLE ice_t EXECUTE EXPIRE_SNAPSHOTS(now() - interval 10 days);
Preventing snapshot expiration
You can prevent expiration of recent snapshots by configuring the
history.expire.min-snapshots-to-keep
table property. You can use the alter
table feature to set a property. The history.expire.min-snapshots-to-keep
property refers to a number of snapshots, not a time delta. For example, assume you always want
to keep all snapshots of your table for the last 24 hours. You configure
history.expire.min-snapshots-to-keep as a safety mechanism to enforce this. If your table
receives only one modification (insert / update / merge) per hour, then setting
history.expire.min-snapshots-to-keep = 24 is sufficient to meet your requirement. However, if
your table was consistently receiving updates every minute, then the last 24 hour period would
entail 1440 snapshots, and the history.expire.min-snapshots-to-keep setting would need to be
configured appropriately.
Table data and orphan maintenance
The contents of the table directory (actual data) might, or might not, be removed when you
drop the table. An orphan data file can remain when you
drop an Iceberg table, depending on the external.table.purge
flag table
property. An orphaned data file is one that has contents in the table directory, but no
snapshot.
Expiring a snapshot does not remove old metadata files by default. You must clean up metadata
files using write.metadata.delete-after-commit.enabled=true
and
write.metadata.previous-versions-max
table properties. For more information,
see "Iceberg table properties" below. Setting this property controls automatic metadata file
removal after metadata operations, such as expiring snapshots or inserting data.