Understanding the Cloudera Observability On-Premises cluster services metrics
Describes the Cloudera Observability On-Premises cluster services metrics, which are visually displayed in a series of charts that show the state, activity, and performance of the Cloudera Observability On-Premises cluster services. Accessed from Cloudera Manager they help you monitor the health, performance, and workload usage of your Cloudera Observability On-Premises Cluster Services for identifying and troubleshooting existing and potential problems.
The Cloudera Observability On-Premises cluster services metric charts are displayed on the Cloudera Observability On-Premises Cluster and the Cloudera Observability On-Premises Services pages located in the Cloudera Manager Admin Console. For further analysis, each metric chart can be opened to display more detailed information.
The metrics displayed are dependent on the selected Cloudera Observability On-Premises element. But, whether you view a chart from the Cloudera Observability On-Premises Cluster Status tab page, the Charts Library tab page, or a Cloudera Observability On-Premises Service's page, the basic functionality works in the same way.
- Change the size of the chart by dragging it's lower-right corner.
- View detailed information about elements of interest in the chart by hovering your mouse over the element. When you move your mouse horizontally across the chart, the data values will change according to the time represented.
- For additional information, you can enlarge the pop-up window by clicking Click to expand.
- When the pop-up window is fully expanded you can view:
- The Cloudera Observability On-Premises service associated with the chart by clicking View Service.
- Display the chart on its own page by clicking View Entity Chart.
For more information about charts in Cloudera Manager, click the Related Information link below.
Cloudera Observability On-Premises cluster chart library categories
The Cloudera Observability On-Premises Status Page visually displays a limited set of metrics that are based on historical Cloudera Observability On-Premises user payload analysis.
The Charts Library displays a much larger set of metric charts, which are organized into categories.
- Status Page Charts, whose charts display a consolidated view of the overall Cloudera Observability On-Premises cluster metrics.
- Zookeeper Queue, whose charts display the ZooKeeper service
metrics, including the number of queues and shards for all streams.
When the number of messages in a Zookeeper queue exceeds the defined threshold limits, a Cloudera Observability On-Premises health check alert is triggered. For more information about the Zookeeper Elevated Queue Count alerts, click the Related Information link below.
- Counters, whose charts display the number of jobs received and the number of jobs that failed. Counter metrics are also separated into Pipeline, Analytic Database, and SDX service categories.
- Processing Timers, whose charts display the average job
processing time and the average rate across servers. They are calculated using the 75th
and 95th percentiles. Processing Timer metrics are also separated into Pipelines and
Analytic Database service categories.
When less than 75% of the service's audit payloads are processing slower than the defined yellow and red timer threshold limits, a Cloudera Observability On-Premises health check alert is triggered. For more information about the Slow Payload Processing Timer alerts, click the Related Information link below.
- Events, whose charts display the number of important and informational alerts.
Cloudera Observability On-Premises services categories
The Cloudera Observability On-Premises Services chart categories are accessed by selecting the Cloudera Observability On-Premises service in the Status Summary section of the Cloudera Observability On-Premises Status page.
- Status Page Charts, whose charts display a limited set of Cloudera Observability On-Premises service's metrics.
- Counters, whose charts display the number of jobs or queries received and the number of jobs or queries that failed.
- Processing Timers, whose charts display the average job processing time and the average rate across servers. They are calculated using the 75th and 95th percentiles. Processing Timer metrics are also separated into Pipelines and Analytic Database service categories.
- Payload Size, whose histogram charts display the average, maximum, minimum, and 75th percentile processing payload sizes.
- Process Resources, whose charts display metrics about the service's processing resources, such as the amount of resident memory used.
- Host Resources, whose charts display metrics about the service's host, which are broken down, depending on the service, into CPU, Memory, Disk Aggregates, Disk Comparison, Network Aggregates, Network Interface Comparison, File Descriptors, and Entropy categories.
- Liveness, whose chart displays metrics about the service's processing performance.
- Events, whose charts display the number of important and informational alerts