Understanding the Workload XM Cluster Services Metrics

Describes the Workload XM cluster services metrics, which are visually displayed in a series of charts that show the state, activity, and performance of the Workload XM cluster services. Accessed from Cloudera Manager they help you monitor the health, performance, and workload usage of your Workload XM Cluster Services for identifying and troubleshooting existing and potential problems.

The Workload XM cluster services metric charts are displayed on the Workload XM Cluster and the Workload XM Services pages located in the Cloudera Manager Admin Console. For further analysis, each metric chart can be opened to display more detailed information.

The metrics displayed are dependent on the selected Workload XM element. But, whether you view a chart from the Workload XM Cluster Status tab page, the Charts Library tab page, or a Workload XM Service's page, the basic functionality works in the same way.

For example, you can:
  • Change the size of the chart by dragging it's lower-right corner.
  • View detailed information about elements of interest in the chart by hovering your mouse over the element. When you move your mouse horizontally across the chart, the data values will change according to the time represented.
  • For additional information, you can enlarge the pop-up window by clicking Click to expand.
  • When the pop-up window is fully expanded you can view:
    • The Workload XM service associated with the chart by clicking View Service.
    • Display the chart on its own page by clicking View Entity Chart.

For more information about charts in Cloudera Manager, click the Related Information link below.

Workload XM Cluster Chart Library Categories

The Workload XM Status Page visually displays a limited set of metrics that are based on historical Workload XM user payload analysis.

The Charts Library displays a much larger set of metric charts, which are organized into categories.

The following lists the Workload XM Chart Library categories available on the Charts Library page, accessed by clicking the Charts Library tab:
  • Status Page Charts, whose charts display a consolidated view of the overall Workload XM cluster metrics.
  • Zookeeper Queue, whose charts display the ZooKeeper service metrics, including the number of queues and shards for all streams.

    When the number of messages in a Zookeeper queue exceeds the defined threshold limits, a Workload XM health check alert is triggered. For more information about the Zookeeper Elevated Queue Count alerts, click the Related Information link below.

  • Counters, whose charts display the number of jobs received and the number of jobs that failed. Counter metrics are also separated into Pipeline, Analytic Database, and SDX service categories.
  • Processing Timers, whose charts display the average job processing time and the average rate across servers. They are calculated using the 75th and 95th percentiles. Processing Timer metrics are also separated into Pipelines and Analytic Database service categories.

    When less than 75% of the service's audit payloads are processing slower than the defined yellow and red timer threshold limits, a Workload XM health check alert is triggered. For more information about the Slow Payload Processing Timer alerts, click the Related Information link below.

  • Events, whose charts display the number of important and informational alerts.

Workload XM Services Categories

The Workload XM Services chart categories are accessed by selecting the Workload XM service in the Status Summary section of the Workload XM Status page.

The following lists the Workload XM service's chart categories. As they are dependent on the service you are viewing, not all of the following categories will be displayed:
  • Status Page Charts, whose charts display a limited set of Workload XM service's metrics.
  • Counters, whose charts display the number of jobs or queries received and the number of jobs or queries that failed.
  • Processing Timers, whose charts display the average job processing time and the average rate across servers. They are calculated using the 75th and 95th percentiles. Processing Timer metrics are also separated into Pipelines and Analytic Database service categories.
  • Payload Size, whose histogram charts display the average, maximum, minimum, and 75th percentile processing payload sizes.
  • Process Resources, whose charts display metrics about the service's processing resources, such as the amount of resident memory used.
  • Host Resources, whose charts display metrics about the service's host, which are broken down, depending on the service, into CPU, Memory, Disk Aggregates, Disk Comparison, Network Aggregates, Network Interface Comparison, File Descriptors, and Entropy categories.
  • Liveness, whose chart displays metrics about the service's processing performance.
  • Events, whose charts display the number of important and informational alerts