Metric Aggregation
In addition to collecting and storing raw metric values, the Cloudera Manager Service Monitor and Host Monitor produce a number of aggregate metrics from the raw metric data.
Where a raw data point is a timestamp value pair, an aggregate metric point is a timestamp paired with a bundle of statistics including the minimum, maximum, average, and standard deviation of the data points considered by the aggregate.
Individual metric streams are aggregated across time to produce statistical summaries at different data granularities. For example, an individual metric stream of the number of open file descriptors on a host will be aggregated over time to the ten-minute, hourly, six-hourly, daily and weekly data granularities. A point in the hourly aggregate stream will include the maximum number of open file descriptors seen during that hour, the minimum, the average and so on. When servicing a time-series request, either for the Cloudera Manager UI or API, the Service Monitor and Host Monitor automatically choose the appropriate data granularity based on the time-range requested.
Cross-Time Aggregate Example
Consider the following fd_open
raw metric values for a host:
9:00, 100 fds 9:01, 101 fds 9:02, 102 fds . . . 9:09, 109 fds
The ten minutely cross-time aggregate point covering the ten-minute window from 9:00 - 9:10 would have the following statistics and metadata:
min: 100 fds min timestamp: 9:00 max 109 fds max timestamp 9:09 mean 104.5 fds standard deviation: 3.02765 fds count: 10 points sample: 109 fds sample timestamp: 9:09
The Service Monitor and Host Monitor also produce cross-entity aggregates for a number of entities in the system. Cross-entity aggregates are produced by considering the metric value of a particular metric across a number of entities of the same type at a particular time. For each stream considered, two metrics are produced. The first tracks statistics such as the minimum, maximum, average and standard deviation across all considered entities as well as the identities of the entities that had the minimum and maximum values. The second tracks the sum of the metric across all considered entities.
An example of the first type of cross-entity aggregate is the fd_open_across_datanodes
metric. For an HDFS
service this metric contains aggregate statistics on the fd_open
metric value for all the DataNodes in the
service. For a rack this metric contains statistics for all the DataNodes within that rack,
and so on. An example of the second type of cross-entity aggregate is the total_fd_open_across_datanodes
metric. For an
HDFS service this metric contains the total number of file descriptors open by all the
DataNodes in the service. For a rack this metric contains the total number of file
descriptors open by all the DataNodes within the rack, and so on. Note that unlike the first
type of cross-entity aggregate, this total type of cross-entity aggregate is a simple
timestamp, value pair and not a bundle of statistics.
Cross-Entity Aggregate Example
Consider the following fd_open
raw metric values for a set of ten DataNodes in an HDFS service at a
given timestamp:
datanode-0, 200 fds datanode-1, 201 fds datanode-2, 202 fds … datanode-9, 209 fds
fd_open_across_datanodes
point for that HDFS service at that time would have the
following statistics and
metadata:min: 200 fds min entity: datanode-0 max: 209 fds max entity: datanode-9 mean: 204.5 fds standard deviation: 3.02765 fds count: 10 points sample: 209 fds sample entity: datanode-9
Just like every other metric, cross-entity aggregates are aggregated
across time. For example, a point in the hourly aggregate of fd_open_across_datanodes
for an HDFS service will
include the maximum fd_open
value of any
DataNode in that service over that hour, the average value over the hour, and so on. A point
in the hourly aggregate of total_fd_open_across_datanodes
for an HDFS service will contain statistics on
the value of the total_fd_open_across_datanodes
for that service over the hour.