Alert thresholds and the threshold units are dependent on alert type. The following table lists the types of alerts, their possible status and if the thresholds are configurable:
Alert Type |
Description |
Thresholds Configurable |
Threshold Units |
---|---|---|---|
WEB | Connects to a Web URL. Alert status is based on the HTTP response code. | No | n/a |
PORT | Connects to a port. Alert status is based on response time. | Yes | seconds |
METRIC | Checks the value of a service metric. Units vary, based on the metric being checked. | Yes | varies |
AGGREGATE | Aggregates the status for another alert. | Yes | % |
SCRIPT | Executes a script to handle the alert check. | No | n/a |
WEB Alert Type
WEB alerts watch a Web URL on a given component and the alert status is determined based on the HTTP response code. Therefore, you cannot change what HTTP response codes determine the thresholds for WEB alerts. Although you can customize what the response text for each threshold. The response code and corresponding status for WEB alerts:
OK status if Web URL responds with code under 400.
WARNING status if Web URL responds with code 400 and above.
CRITICAL status if Ambari cannot connect to Web URL.
Note | |
---|---|
The connection timeout defaults to 5.0 seconds on the connection_timeout property on the alert definition when accessed from the Alerts API. GET /api/v1/clusters/MyCluster/alert_definitions/42 "source" : { "reporting" : { ... }, "type" : "WEB", "uri" : { ... "connection_timeout" : 5.0 } } |
PORT Alert Type
PORT alerts check the response time to connect to a given a port and the threshold units are based on seconds.
METRIC Alert Type
METRIC alerts check the value of a single or multiple metrics (if a calculation is performed). The metric is accessed from a URL endpoint available on a given component. The thresholds are adjustable and the units for each threshold are metric-dependent. For example, in the case of “CPU utilization” alerts, the unit is “%”. And in the case of “RPC latency” alerts, the unit is “milliseconds (ms)”.
Note | |
---|---|
The connection timeout defaults to 5.0 seconds on the connection_timeout property on the alert definition when accessed from the Alerts API. GET /api/v1/clusters/MyCluster/alert_definitions/32 "source" : { "reporting" : { ... }, "type" : "METRIC", "uri" : { ... "connection_timeout" : 5.0 } } |
AGGREGATE Alert Type
AGGREGATE alerts aggregate the alert status as a percentage of the alert instances affected. For example, the “Percent DataNode Process” alert aggregates the “DataNode Process” alert. The threshold units are “%”.
SCRIPT Alert Type
SCRIPT alerts execute a script and the script determines status such as OK, WARNING or CRITICAL. The thresholds and response text built-into the alert definitions but are not modifiable from the Ambari Web UI.
Note | |
---|---|
The location of the script is available on the path property on the alert definition when accessed from the Alerts API. GET /api/v1/clusters/MyCluster/alert_definitions/19 "source" : { "parameters" : { ... }, "path" : "HDFS/2.1.0.2.0/package/alerts/alert_ha_namenode_health.py", "type" : "SCRIPT" } |