Cluster engine and performance metrics

From the Cloudera Observability Real-time monitoring (RTM) Clusters page, you can monitor cluster usage trends in real-time and gain visibility into cluster capacity, utilization levels, and assess when to scale up by adding more nodes. Monitoring cluster status helps to ensure that all clusters are operational and to identify critical issues that require immediate attention.

As an administrator, you can identify and shut down idle clusters promptly. You can optimize resource usage by consolidating multiple unused clusters into one cluster, thereby reducing wastage and improving efficiency. This visibility empowers you to make informed decisions about allocating resources introducing new jobs, and redistributing workloads to maximize efficiency across your systems.

Cluster data filters

The Clusters page visually displays a summary of all the clusters and their engines in your environment and their overall health.

You can use the following filters to minimize the list of clusters and engines and focus on specific anomalies:
  • Search: Search for a specific cluster or engine.
  • Status: Select any one or multiple health check metrics and click Apply.

In a multi-cluster setup, clusters' rows are initially collapsed. To see details about a specific cluster's engines, click to expand its row.

Cluster performance metrics

The header on the Clusters page shows the name of your cluster, along with its current memory and CPU core allocation.

The cards, charts, and widgets offer quick insights into the current status and health of your environment's memory, CPU, activity, and data throughput.
  • MEMORY: Shows the current memory consumption and allocation in Gigabytes. This enables you to determine which cluster has enough capacity to introduce new jobs or predict the impact on scheduled or submitted jobs based on the available memory.
  • CPU: Displays the overall CPU capacity of the cluster, measured by the total number of CPU cores currently available versus those in use. This enables you to determine each cluster's ability to handle new jobs and predict the impact on scheduled or submitted jobs based on available CPU resources. Cloudera recommends avoiding sustained CPU utilization over 80% per core.
  • BUSYNESS ACTIVITY: Provides a historical view of cluster activity, represented as a percentage. This metric helps understand cluster utilization and associated costs, calculated using the Cloudera Compute Unit (CCU) measurement, which combines CPU and memory usage. The value helps determine which clusters are over-provisioned and which ones require scaling up.
  • NETWORK I/O: Measures in Gigabytes per second (GB/sec). Offers a historical perspective on network activity, indicating the duration and volume of data received and transmitted over the network interface during processing.
  • DATA R/W: Measures in Gigabytes per second (GB/sec). Provides a historical overview of data read and write operations, including the duration and volume of data transferred to and from the engine's storage device.

The Clusters page is automatically refreshed every minute to update the cluster metrics and real-time data.

Cluster engine metrics

The cluster engine metrics table lists engine names, the current state of the engine, and the amount of memory currently used by the engine.