Compaction observability in Cloudera Manager
Compaction observability is a notification and information system based on metrics about the health of the compaction process. A healthy compaction process is critical to query performance, availability, and uptime of your data warehouse. You learn how to use compaction observability to prevent serious problems from developing.
Compaction runs in the background. At regular intervals, Hive accesses the health of the
compaction process, and logs an error in the event of a problem. The assessment is based on
metrics, such as the number of rows in the metadata table TXN_TO_WRITE_ID
and
the age of the longest running transaction (oldest_open_txn_age_in_sec). For example, if
compaction is not running, the TXN_TO_WRITE_ID
table in the HMS backend database
becomes bloated and queries slow down.
- Warnings and errors indicating problems in compaction processes
- Charts of various metrics that provide information about compaction health
- Recommended actions to address suboptimal configurations
Compaction observability does not attempt to do root cause analysis (RCA) and does not attempt to fix the underlying problem. Compaction observability helps you quickly respond to symptoms of compaction problems.
Using Cloudera Manager, you can view compaction health checks for the Hive Metastore and Hive on Tez services, view actions and advice related to configurations and thresholds, and use the Compaction tab from the Hive Metastore service to view compaction-related charts based on the collected metrics.