Tablet history garbage collection and the ancient history mark
Kudu uses a multiversion concurrency control (MVCC) mechanism to ensure that snapshot scans can proceed isolated from new changes to a table. Therefore, periodically, old historical data should be garbage-collected (removed) to free up disk space. While Kudu never removes rows or data that are visible in the latest version of the data, Kudu does remove records of old changes that are no longer visible.
The specific threshold in time (in the past) beyond which historical
MVCC data becomes inaccessible and is free to be deleted is called the ancient history mark (AHM). The AHM can be configured by
There are two background tasks that remove historical MVCC data older than the AHM:
- The one that runs the merging compaction, called
- A separate background task deletes old undo delta blocks,
UndoDeltaBlockGCOpreduces disk space usage in all workloads, but particularly in those with a higher volume of updates or upserts. The metrics associated with this background task have the prefix