HDFS Administration
Also available as:
PDF
loading table of contents...

Caching Use Cases

Centralized cache management is useful for:

  • Files that are accessed repeatedly -- For example, a small fact table in Hive that is often used for joins is a good candidate for caching. Conversely, caching the input of a once- yearly reporting query is probably less useful, since the historical data might only be read once.

  • Mixed workloads with performance SLAs -- Caching the working set of a high priority workload ensures that it does not compete with low priority workloads for disk I/O.