Reading, Filtering, and Sending Edits
The rate at which the source attempts to read the data depends on the filtering of log entries and and the total size of the list edits to replicate per slave.
By default, a source attempts to read from a WAL and ships log entries to a sink as quickly
as possible. Speed is limited by the filtering of log entries. Only Key-Values that are
scoped GLOBAL
and that do not belong to catalog tables are retained.
Speed is limited by total size of the list of edits to replicate per slave, which is
limited to 64 MB by default. With this configuration, a master cluster Region Server
with three slaves would use at most 192 MB to store data to replicate. This does not
account for the data which was filtered but not garbage collected.
Once the maximum size of edits has been buffered or the reader reaches the end of the WAL, the source thread stops reading and chooses at random a sink to replicate to from the list that was generated by keeping only a subset of slave Region Servers. It directly issues an RPC to the chosen RegionServer and waits for the method to return. If the RPC is successful, the source determines whether the current file has been emptied or whether it contains more data that needs to be read. If the file has been emptied, the source deletes the znode in the queue. Otherwise, it registers the new offset in the log znode. If the RPC throws an exception, the source retries 10 times before trying to find a different sink.