Hive Compaction System Health Check
- Oldest initiated compaction passed threshold:Check the last successful compactions in the SYS.COMPACTION view. If no compactions have finished recently, check the logs for errors in the Worker threads for either the Compactor Virtual Warehouse or in the DataHub clusters, if applicable. If compactions are running successfully, but there are many queued compactions, increase the number of Worker threads set for 'hive.compactor.worker.threads' in the Compactor Virtual Warehouse or DataHub clusters. Alternatively, you can increase the amount of available resources
- Large number of compaction failures:Many compactions are failing.
- Run SELECT * FROM SYS.COMPACTIONS WHERE C_STATE IN ("failed", "did not initiate"); to check the error message in the sys.compaction table.
- Check the logs in the Compactor Virtual Warehouse for the "Caught exception while trying to compact" message.
- Oldest uncleaned compaction passed threshold:Compactor cleaner process cannot clean obsolete data.
- Check the metastore log to make sure the cleaner thread is running.
- Run SHOW TRANSACTIONS; to check if there are open transactions blocking the cleaner. If the blocking transactions are not of the 'type=REPL_CREATED', they can be aborted by running ABORT TRANSACTIONS <transactionID>;
- Old aborted transaction not cleared:An old aborted transaction was not cleared. This can impact your overall query performance. This is most likely caused by an uncompacted table. Check for failed compactions in the SYS.COMPACTION view.
- Tables/partitions with many aborts:At least one table or partition has not been compacted and have more than hive.metastore.acidmetrics.table.aborted.txns.threshold (default: 1500) aborts.
- Verify that the "hive.compactor.abortedtxn.threshold" is not set to a value higher than 1000.
- Find the affected table/partition by running SELECT TC_DATABASE, TC_TABLE, TC_PARTITION, COUNT() FROM SYS.TRANSACTIONS WHERE STATE='aborted' GROUP BY TC_DATABASE, TC_TABLE, TC_PARTITION HAVING COUNT()>n; Where n is the value of hive.metastore.acidmetrics.table.aborted.txns.threshold. The default is 1500.
- Check the SYS.COMPACTIONS table for compactions on this table or partition in the "did not initiate", "failed", or "ready for cleaning" state.
- Check the Compactor Virtual Warehouse logs for "compactor.Initiator errors associated with this table or partition.
- Manually initiate compaction on the table/partition by running ALTER TABLE <table-name> [PARTITION (<partition-column>=<partition-value>] COMPACT 'minor';
- Oldest open transaction passed threshold:A transaction has been open for longer than the set threshold. Stuck transactions can block cleaning obsolete files and ACID metadata, which can lead to performance degradation. Check for old open transactions with SHOW TRANSACTIONS. If these are not type=REPL_CREATED, they can be aborted with ABORT TRANSACTIONS <txnid>;
- Oldest open replication transaction passed threshold:A replication transaction has been open longer than the set threshold. Stuck transactions can block cleaning of ACID metadata, which can lead to performance degradation.Before you abort a transaction that was created by replication, and which has been open a long time, make sure that the hive.repl.txn.timeout threshold has expired. If it has, there may be an issue with replication. Please contact Cloudera Support.
- Excessive ACID metadata:An excessive amount of Hive ACID metadata exists, which can cause serious performance degradation. Failing compactions or metadata cleanup service problems can cause this.
- Check for failed compactions by running SELECT * FROM SYS.COMPACTIONS WHERE c_state IN ("failed", "did not initiate"); against the SYS.COMPACTIONS view.
- Search for failed compactions in the HiveServer2 logs, which are indicated with the message "Caught exception while trying to compact"
- Excessive ACID metadata:An excessive amount of Hive ACID metadata exists, which can cause serious performance degradation. Failing compactions or metadata cleanup service problems can cause this.
- Check for failed compactions by running SELECT * FROM SYS.COMPACTIONS WHERE c_state IN ("failed", "did not initiate"); against the SYS.COMPACTIONS view.
- Search for failed compactions in the HiveServer2 logs, which are indicated with the message "Caught exception while trying to compact"
- Activity on table with disabled auto-compaction:A change has been made on a table where automated compaction is disabled. If this is not intended, re-enable compaction on the table by running ALTER TABLE table_name SET TBLPROPERTIES ("no_auto_compaction" = "false").
- Table/partition with large number of active delta directoriesThere are tables with a large number of delta directories, which can cause performance degradation. Check the "Top 100 partitions or unpartitioned tables with the most active deltas" chart in CM to see affected tables and partitions. These tables and partitions should be compacted.
- Table/partition with large number of small delta directoriesThere are tables with a large number of small delta directories, which can cause performance degradation. Check the "Top 100 partitions or unpartitioned tables with the most small deltas" chart in CM to see affected tables and partitions. These tables and partitions should be compacted.
Short Name: Compaction System Health Check
Excessive ACID metadata (CompletedTxnComponents) Thresholds
- Description
- The health test thresholds for excessive acid metadata in CompletedTxnComponents
- Template Name
- hive_compaction_metadata_comp_thresholds
- Default Value
- CDH=[[CDH 7.1.8..CDH 7.2.0)=critical:1000000.0, warning:500000.0, [CDH 7.2.14..CDH 8.0.0)=critical:1000000.0, warning:500000.0]
- Unit(s)
- no unit
Excessive ACID metadata (TxnToWriteId) Thresholds
- Description
- The health test thresholds for excessive acid metadata in TxnToWriteId
- Template Name
- hive_compaction_metadata_ttw_thresholds
- Default Value
- CDH=[[CDH 7.1.8..CDH 7.2.0)=critical:1000000.0, warning:500000.0, [CDH 7.2.14..CDH 8.0.0)=critical:1000000.0, warning:500000.0]
- Unit(s)
- TIMES
Failed Compaction Thresholds
- Description
- The health test thresholds for the number of failed compactions.
- Template Name
- hive_compaction_failed_thresholds
- Default Value
- critical:never, warning:1.0
- Unit(s)
- PERCENT
Hive Compaction Health Test
- Description
- Enables the health test that checks whether compaction processes are properly configured and operational.
- Template Name
- hive_compaction_health_check_enabled
- Default Value
- false
- Unit(s)
- no unit
Oldest Initiated Compaction Thresholds
- Description
- The health test thresholds for the oldest initiated compaction.
- Template Name
- hive_compaction_oldest_initiated_thresholds
- Default Value
- critical:43200.0, warning:3600.0
- Unit(s)
- SECONDS
Oldest Open non replication Transaction Thresholds
- Description
- The health test thresholds for the oldest open non replication transaction
- Template Name
- hive_compaction_oldest_open_txn_thresholds
- Default Value
- CDH=[[CDH 7.1.8..CDH 7.2.0)=critical:259200.0, warning:86400.0, [CDH 7.2.14..CDH 8.0.0)=critical:259200.0, warning:86400.0]
- Unit(s)
- SECONDS
Oldest Open replication Transaction Thresholds
- Description
- The health test thresholds for the oldest open replication transaction
- Template Name
- hive_compaction_oldest_open_repl_txn_thresholds
- Default Value
- CDH=[[CDH 7.1.8..CDH 7.2.0)=critical:1814400.0, warning:1209600.0, [CDH 7.2.14..CDH 8.0.0)=critical:1814400.0, warning:1209600.0]
- Unit(s)
- SECONDS
Oldest Uncleaned Aborted Transaction Thresholds
- Description
- The health test thresholds for the oldest uncleaned aborted transaction.
- Template Name
- hive_compaction_aborted_txn_not_cleaned_thresholds
- Default Value
- CDH=[[CDH 7.1.8..CDH 7.2.0)=critical:172800.0, warning:86400.0, [CDH 7.2.14..CDH 8.0.0)=critical:172800.0, warning:86400.0]
- Unit(s)
- SECONDS
Oldest Uncleaned Compaction Thresholds
- Description
- The health test thresholds for the oldest uncleaned compaction.
- Template Name
- hive_compaction_oldest_ready_for_cleaning_thresholds
- Default Value
- CDH=[[CDH 7.1.8..CDH 7.2.0)=critical:never, warning:86400.0, [CDH 7.2.14..CDH 8.0.0)=critical:never, warning:86400.0]
- Unit(s)
- SECONDS
Tables/Partitions With Many Aborts Thresholds
- Description
- The health test thresholds for tables/partitions with many aborted transactions.
- Template Name
- hive_compaction_aborted_txn_thresholds
- Default Value
- CDH=[[CDH 7.1.8..CDH 7.2.0)=critical:1.0, warning:never, [CDH 7.2.14..CDH 8.0.0)=critical:1.0, warning:never]
- Unit(s)
- TIMES
Tables/Partitions with Large Number of Active Delta Directories Thresholds
- Description
- The health test thresholds for table/partition with large number of active delta directories.
- Template Name
- hive_compaction_active_delta_thresholds
- Default Value
- CDH=[[CDH 7.1.8..CDH 7.2.0)=critical:never, warning:200.0, [CDH 7.2.14..CDH 8.0.0)=critical:never, warning:200.0]
- Unit(s)
- TIMES
Tables/Partitions with Large Number of Small Delta Directories Thresholds
- Description
- The health test thresholds for table/partition with large number of small delta directories
- Template Name
- hive_compaction_small_delta_thresholds
- Default Value
- CDH=[[CDH 7.1.8..CDH 7.2.0)=critical:never, warning:200.0, [CDH 7.2.14..CDH 8.0.0)=critical:never, warning:200.0]
- Unit(s)
- TIMES
Writes to Tables with Disabled Auto-Compaction Thresholds
- Description
- The health test thresholds for writes to tables where the auto-compaction is disabled
- Template Name
- hive_compaction_disabled_thresholds
- Default Value
- CDH=[[CDH 7.1.8..CDH 7.2.0)=critical:never, warning:1.0, [CDH 7.2.14..CDH 8.0.0)=critical:never, warning:1.0]
- Unit(s)
- TIMES