Tablet history garbage collection and the ancient history mark
Kudu uses a multiversion concurrency control (MVCC) mechanism to ensure that snapshot scans can proceed isolated from new changes to a table. Therefore, periodically, old historical data should be garbage-collected (removed) to free up disk space. While Kudu never removes rows or data that are visible in the latest version of the data, Kudu does remove records of old changes that are no longer visible.
The specific threshold in time (in the past) beyond which historical
MVCC data becomes inaccessible and is free to be deleted is called the ancient history mark (AHM). The AHM can be configured by
setting the --tablet_history_max_age_sec
property.
There are two background tasks that remove historical MVCC data older
than the AHM:
- The one that runs the merging compaction, called
CompactRowSetsOp
(see above). - A separate background task deletes old undo delta blocks,
called
UndoDeltaBlockGCOp
. RunningUndoDeltaBlockGCOp
reduces disk space usage in all workloads, but particularly in those with a higher volume of updates or upserts. The metrics associated with this background task have the prefixundo_delta_block
.