loading table of contents...

1.1. Alert Types

Alert thresholds and the threshold units are dependent on alert type. The following table lists the types of alerts, their possible status and if the thresholds are configurable:

Alert Type

Description

Thresholds Configurable

Threshold Units

WEB

Connects to a Web URL. Alert status is based on the HTTP response code. No n/a
PORT Connects to a port. Alert status is based on response time. Yes seconds
METRIC Checks the value of a service metric. Units vary, based on the metric being checked. Yes varies
AGGREGATE Aggregates the status for another alert. Yes %
SCRIPT Executes a script to handle the alert check. No n/a

WEB Alert Type

WEB alerts watch a Web URL on a given component and the alert status is determined based on the HTTP response code. Therefore, you cannot change what HTTP response codes determine the thresholds for WEB alerts. Although you can customize what the response text for each threshold. The response code and corresponding status for WEB alerts:

  • OK status if Web URL responds with code under 400.

  • WARNING status if Web URL responds with code 400 and above.

  • CRITICAL status if Ambari cannot connect to Web URL.

[Note]Note

The connection timeout defaults to 5.0 seconds on the connection_timeout property on the alert definition when accessed from the Alerts API.

GET /api/v1/clusters/MyCluster/alert_definitions/42

   "source" : {
      "reporting" : {
        ...
      },
      "type" : "WEB",
      "uri" : {
        ...
        "connection_timeout" : 5.0
      }
    } 

PORT Alert Type

PORT alerts check the response time to connect to a given a port and the threshold units are based on seconds.

METRIC Alert Type

METRIC alerts check the value of a single or multiple metrics (if a calculation is performed). The metric is accessed from a URL endpoint available on a given component. The thresholds are adjustable and the units for each threshold are metric-dependent. For example, in the case of “CPU utilization” alerts, the unit is “%”. And in the case of “RPC latency” alerts, the unit is “milliseconds (ms)”.

[Note]Note

The connection timeout defaults to 5.0 seconds on the connection_timeout property on the alert definition when accessed from the Alerts API.

GET /api/v1/clusters/MyCluster/alert_definitions/32

   "source" : {
      "reporting" : {
        ...
      },
      "type" : "METRIC",
      "uri" : {
        ...
        "connection_timeout" : 5.0
      }
    }  

AGGREGATE Alert Type

AGGREGATE alerts aggregate the alert status as a percentage of the alert instances affected. For example, the “Percent DataNode Process” alert aggregates the “DataNode Process” alert. The threshold units are “%”.

SCRIPT Alert Type

SCRIPT alerts execute a script and the script determines status such as OK, WARNING or CRITICAL. The thresholds and response text built-into the alert definitions but are not modifiable from the Ambari Web UI.

[Note]Note

The location of the script is available on the path property on the alert definition when accessed from the Alerts API.

GET /api/v1/clusters/MyCluster/alert_definitions/19

   "source" : {
      "parameters" : {
        ...
      },
      "path" : "HDFS/2.1.0.2.0/package/alerts/alert_ha_namenode_health.py",
      "type" : "SCRIPT"
    }