Planning for Apache KuduPDF version

Memory

The flag --memory_limit_hard_bytes determines the maximum amount of memory that a Kudu tablet server may use. The amount of memory used by a tablet server scales with data size, write workload, and read concurrency.

The following table provides numbers that can be used to compute a rough estimate of memory usage:

Table 1. Tablet server memory usage
Type Multiplier Description
Memory required per TB of data on disk 1.5GB per 1TB data on disk Amount of memory per unit of data on disk required for basic operation of the tablet server.
Hot Replicas' MemRowSets and DeltaMemStores minimum 128MB per hot replica Minimum amount of data to flush per MemRowSet flush. For most use cases, updates should be rare compared to inserts, so the DeltaMemStores should be very small.
Scans 256KB per column per core for read-heavy tables Amount of memory used by scanners, and which will be constantly needed for tables which are constantly read.
Block Cache Fixed by --block_cache_capacity_mb(default 512MB) Amount of memory reserved for use by the block cache.

Using this information for the example load gives the following breakdown of memory usage:

Table 2. Example tablet server memory usage
Type Amount
8TB data on disk 8TB * 1.5GB / 1TB = 12GB
200 hot replicas 200 * 128MB = 25.6GB
1 40-column, frequently-scanned table 40 * 40 * 256KB = 409.6MB
Block Cache --block_cache_capacity_mb=512 = 512MB
Expected memory usage 38.5GB
Recommended hard limit 52GB

Using this as a rough estimate of Kudu’s memory usage, select a memory limit so that the expected memory usage of Kudu is around 50-75% of the hard limit.