Expire snapshots feature
You can expire snapshots that Iceberg generates when you create or modify a table. During the lifetime of a table the number of snapshots of the table accumulate. You learn how to remove snapshots you no longer need.
You should periodically expire snapshots to delete data files that are no longer needed, and to reduce the size of table metadata. Each write to an Iceberg table from Hive creates a new snapshot, or version, of a table. Snapshots can be used for time-travel queries, or for rollbacks. The table can be rolled back to any valid snapshot. Snapshots accumulate until they are expired by the expire_snapshots operation.
You use the following syntax to expire snapshots older than a timestamp or timestamp expression:
Hive or Impala syntax
ALTER TABLE ... EXECUTE expire_snapshots(<timestamp expression>)
Hive or Impala example
The first example removes snapshots having a timestamp older than August 15, 2022 1:50 pm. The second example removes snapshots from 10 days ago and before.
ALTER TABLE ice_11 EXECUTE expire_snapshots('2022-08-15 13:50:00');
ALTER TABLE ice_t EXECUTE expire_snapshots(now() - interval 10 days);
You should periodically expire snapshots to delete data files that are no longer needed, and to reduce the size of table metadata. Each write to an Iceberg table from Hive creates a new snapshot, or version, of a table. Snapshots can be used for time-travel queries, or for rollbacks. The table can be rolled back to any valid snapshot. Snapshots accumulate until they are expired by the expire_snapshots operation.
ALTER TABLE test_table EXECUTE expire_snapshots('2021-12-09 05:39:18.689000000');
Preventing snapshot expiration
You can prevent expiration of recent snapshots by configuring the
history.expire.min-snapshots-to-keep
table property. You can use the alter table feature to set a property.
Table data and orphan maintenance
The contents of the table directory (actual data) might, or might not, be removed when you
drop the table. An orphan data file can remain when you
drop an Iceberg table, depending on the external.table.purge
flag table
property. An orphaned data file is one that has contents in the table directory, but no
snapshot.
Expiring a snapshot does not remove old metadata files by default. You must clean up metadata
files using write.metadata.delete-after-commit.enabled=true
and
write.metadata.previous-versions-max
table properties. For more information,
see "Iceberg table properties". Setting this property
controls automatic metadata file removal after metadata operations, such as expiring snapshots
or inserting data.