Scaling Namespaces and Optimizing Data Storage
Also available as:
PDF
loading table of contents...

Use cases for centralized cache management

Centralized cache management is useful for files that are accessed repeatedly and for mixed workloads that have performance SLAs.

  • Files that are accessed repeatedly: For example, a small fact table in Hive that is often used for joins is a good candidate for caching. Conversely, caching the input of a once-yearly reporting query is probably less useful, since the historical data might only be read once.

  • Mixed workloads with performance SLAs: Caching the working set of a high priority workload ensures that it does not compete with low priority workloads for disk I/O.