1.1. Alert Types
Alert thresholds and the threshold units are dependent on alert type. The following table lists the types of alerts, their possible status and if the thresholds are configurable:
Alert Type |
Description |
Threshold Units |
---|---|---|
WEB | Connects to a Web URL. Alert status is based on the HTTP response code. | seconds |
PORT | Connects to a port. Alert status is based on response time. | seconds |
METRIC | Checks the value of a service metric. Units vary, based on the metric being checked. | varies |
AGGREGATE | Aggregates the status for another alert. | % |
SCRIPT | Executes a script to handle the alert check. | varies |
SERVER |
Executes a server-side runnable class to handle the alert check. | varies |
RECOVERY |
Ambari Agents handle the check for process restarts after terminating unexpectedly. | varies |
WEB Alert Type
WEB alerts watch a Web URL on a given component and the alert status is determined based on the HTTP response code. Therefore, you cannot change what HTTP response codes determine the thresholds for WEB alerts. You can customize the response text for each thresholds and the overall web connection timeout. A connection timeout is considered a CRITICAL alert. The response code and corresponding status for WEB alerts:
OK status if Web URL responds with code under 400.
WARNING status if Web URL responds with code 400 and above.
CRITICAL status if Ambari cannot connect to Web URL.
PORT Alert Type
PORT alerts check the response time to connect to a given a port and the threshold units are based on seconds.
METRIC Alert Type
METRIC alerts check the value of a single or multiple metrics (if a calculation is performed). The metric is accessed from a URL endpoint available on a given component. A connection timeout is considered a CRITICAL alert. The thresholds are adjustable and the units for each threshold are metric-dependent. For example, in the case of “CPU utilization” alerts, the unit is “%”. And in the case of “RPC latency” alerts, the unit is “milliseconds (ms)”.
AGGREGATE Alert Type
AGGREGATE alerts aggregate the alert status as a percentage of the alert instances affected. For example, the “Percent DataNode Process” alert aggregates the “DataNode Process” alert. The threshold units are “%”.
SCRIPT Alert Type
SCRIPT alerts execute a script and the script determines status such as OK, WARNING or CRITICAL. You can customize the response text and values for the various properties and thresholds for the SCRIPT alert.
SERVER Alert Type
SERVER alerts execute a server-side runnable class which determines the alert status such as OK, WARNING or CRITICAL.
RECOVERY Alert Type
RECOVERY alerts are handled by the Ambari Agents that are watching for process restarts. The alert status such as OK, WARNING and CRITICAL are based on the number of times a process is being restarted automatically. This is useful to know in cases where processes are terminating and Ambari is automatically restarting.