cdp-doctor system metrics
Scope
The cdp-doctor system metrics command provides a comprehensive snapshot of system-level resource utilization on a CDP node. It helps validate the node's disk usage, CPU performance, and network connection status, ensuring the system is healthy and not resource-constrained.
This command is often used during diagnostic checks, performance validation, and capacity monitoring on DataLakes, data hubs, and FreeIPA nodes.
- Disk – Partitions
- Lists all mounted file systems, their total, used, and free space, along with utilization percentage.
- Helps identify storage bottlenecks or partitions nearing capacity.
- Disk – Top Largest Folders in /var/log
- Displays the largest directories under /var/log to identify which services generate the most logs.
- Useful for troubleshooting log-related disk usage issues.
- Network – Connections
- Summarizes active TCP connection states such as LISTEN, ESTABLISHED, TIMEWAIT, etc.
- Helps assess network load and connection health.
- CPU – Times
- Shows CPU utilization percentages across different modes (idle, system, user, nice).
- Useful for understanding overall system load and performance.
Use Case
- Performing pre-upgrade or health checks on cluster nodes.
- Investigating performance degradation or disk alerts.
- Validating system readiness during deployment or service restarts.
Sample Output
Running the cdp-doctor system metrics command displays the following output:
Disk - Partitions:
+----------------+---------------+--------+---------+---------+----------+---------+----------+---------+
| Device | Mountpoint | Fstype | Maxfile | Maxpath | Total | Used | Free | Percent |
+----------------+---------------+--------+---------+---------+----------+---------+----------+---------+
| /dev/nvme0n1p3 | / | xfs | 255 | 4096 | 299.8 GB | 90.9 GB | 208.9 GB | 30.3% |
| /dev/nvme0n1p2 | /boot/efi | vfat | 1530 | 4096 | 199.8 MB | 5.8 MB | 194.0 MB | 2.9% |
| /dev/nvme1n1 | /hadoopfs/fs1 | ext4 | 255 | 4096 | 502.9 GB | 8.6 GB | 494.3 GB | 1.7% |
+----------------+---------------+--------+---------+---------+----------+---------+----------+---------+
Disk - Top largest folders in /var/log:
+--------------------------------+----------+
| Path | Size |
+--------------------------------+----------+
| /var/log/solr-infra | 2.0 GB |
| /var/log/ranger | 833.5 MB |
| /var/log/atlas | 230.6 MB |
| /var/log/cloudera-scm-server | 220.4 MB |
| /var/log/hadoop-hdfs | 200.9 MB |
| /var/log/salt | 119.2 MB |
| /var/log/cloudera-scm-firehose | 104.1 MB |
| /var/log/knox | 103.5 MB |
| /var/log/cdp_resources_check | 94.5 MB |
| /var/log/cdp-request-signer | 88.0 MB |
+--------------------------------+----------+
Network - Connections:
+-------------+-----+
| LISTEN | 81 |
| ESTABLISHED | 700 |
| TIMEWAIT | 146 |
| CLOSEWAIT | 35 |
| CLOSED | 0 |
| SYNSEND | 0 |
| SYNRECEIVED | 0 |
| FINWAIT1 | 0 |
| FINWAIT2 | 0 |
| LASTACK | 0 |
+-------------+-----+
CPU - Times:
+--------+--------+
| idle | 80.7 % |
| system | 4.4 % |
| user | 14.0 % |
| nice | 0.0 % |
+--------+--------+
- Disk usage over 80% may trigger warnings and require cleanup.
- Large /var/log folders can indicate noisy services or misconfigured log rotation.
- High CLOSE_WAIT or TIME_WAIT counts may suggest network/socket issues.
- Low (<20%) CPU idle may indicate high load or resource pressure.
