Use cases for centralized cache management
Centralized cache management is useful for files that are accessed repeatedly and for mixed workloads that have performance SLAs.
-
Files that are accessed repeatedly: For example, a small fact table in Hive that is often used for joins is a good candidate for caching. Conversely, caching the input of a once-yearly reporting query is probably less useful, since the historical data might only be read once.
-
Mixed workloads with performance SLAs: Caching the working set of a high priority workload ensures that it does not compete with low priority workloads for disk I/O.