Expire snapshots feature

You can expire snapshots that Iceberg generates when you create or modify a table. During the lifetime of a table the number of snapshots of the table accumulate. You learn how to remove snapshots you no longer need.

You should periodically expire snapshots to delete data files that are no longer needed, and to reduce the size of table metadata. Each write to an Iceberg table from Hive creates a new snapshot, or version, of a table. Snapshots can be used for time-travel queries, or for rollbacks. The table can be rolled back to any valid snapshot. Snapshots accumulate until they are expired by the expire_snapshots operation.

You use the following syntax to expire snapshots older than a timestamp or timestamp expression:

Hive or Impala syntax

ALTER TABLE ... EXECUTE expire_snapshots(<timestamp expression>)

Hive or Impala example

The first example removes snapshots having a timestamp older than August 15, 2022 11:00 am. The second example removes snapshots from 10 days ago and before.

ALTER TABLE ice_11 EXECUTE expire_snapshots('2022-11-04 13:50:00');
ALTER TABLE ice_t EXECUTE expire_snapshots(now() - interval 10 days);

You should periodically expire snapshots to delete data files that are no longer needed, and to reduce the size of table metadata. Each write to an Iceberg table from Hive creates a new snapshot, or version, of a table. Snapshots can be used for time-travel queries, or for rollbacks. The table can be rolled back to any valid snapshot. Snapshots accumulate until they are expired by the expire_snapshots operation.

ALTER TABLE test_table EXECUTE expire_snapshots('2021-12-09 05:39:18.689000000');    

Preventing snapshot expiration

You can prevent expiration of recent snapshots by configuring the history.expire.min-snapshots-to-keep table property. You can use the alter table feature to set a property.

Table data and orphan maintenance

The contents of the table directory (actual data) might, or might not, be removed when you drop the table. An orphan data file can remain when you drop an Iceberg table, depending on the external.table.purge flag table property. An orphaned data file is one that has contents in the table directory, but no snapshot.

Expiring a snapshot does not remove old metadata files by default. You must clean up metadata files using write.metadata.delete-after-commit.enabled=true and write.metadata.previous-versions-max table properties. For more information, see "Iceberg table properties". Setting this property controls automatic metadata file removal after metadata operations, such as expiring snapshots or inserting data.