An Alert Definition includes name, description and check interval, as well as configurable thresholds for each status (depending on the Alert Type).
The following table lists the types of alerts, their possible status and if the thresholds are configurable:
Alert Types
Type |
Description |
Status |
Thresholds Configurable |
Units |
---|---|---|---|---|
PORT |
Watches a port based on a configuration property as the URI. Example: Hive Metastore Process |
OK, WARN, CRIT |
Yes |
seconds |
METRIC |
Watches a metric based on a configuration property. Example: ResourceManager RPC Latency |
OK, WARN, CRIT |
Yes |
variable |
AGGREGATE |
Aggregate of status for another alert definition. Example: percentage NodeManagers Available |
OK, WARN, CRIT |
Yes |
percentage |
WEB |
Watches a Web UI and adjusts status based on response. Example: App Timeline Web UI |
OK, WARN, CRIT |
No |
n/a |
SCRIPT |
Uses a custom script to handle checking. Example: NodeManager Health Summary |
OK, CRIT |
No |
n/a |