Single tablet write operations
Kudu employs Multiversion Concurrency Control (MVCC) and the Raft consensus algorithm. Each write operation in Kudu must go through the following order of operations:
- The tablet's leader acquires all locks for the rows that it will change.
- The leader assigns the write a timestamp before the write is submitted for replication. This timestamp will be the write’s tag in MVCC.
- After a majority of replicas have acknowledged the write, the rows are changed.
- After the changes are complete, they are made visible to concurrent writes and reads, atomically.
All replicas of a tablet observe the same process. Therefore, if a write operation is assigned timestamp n, and changes row x, a second write operation at timestamp m > n is guaranteed to see the new value of x.
This strict ordering of lock acquisition and timestamp assignment is enforced to be consistent across all replicas of a tablet through consensus. Therefore, write operations are ordered with regard to clock-assigned timestamps, relative to other writes in the same tablet. In other words, writes have strict-serializable semantics.
In case of multi-row write operations, while they are Isolated and Durable in an ACID sense, they are not yet fully Atomic. The failure of a single write in a batch operation will not roll back the entire operation, but produce per-row errors.