cdp-doctor service status

Scope

The cdp-doctor service status command performs a comprehensive health check of all critical infrastructure and Cloudera Manager services running on a given node in your Cloudera Data Platform (CDP) environment. It verifies that each system-level and management-level service is installed, active, and responding correctly, ensuring the node is in a healthy operational state.

The command runs a two-level service validation:

  1. Infrastructure Service Checks

    These are base OS and CDP agent-level services required for node communication, logging, and orchestration.

    Services include:

    nginx
    Reverse proxy service used for internal routing
    sshd
    SSH daemon, ensures remote connectivity.
    sssd
    System Security Services Daemon, handles IPA/LDAP authentication.
    salt-bootstrap, salt-minion, salt-master, salt-api
    SaltStack components managing cluster configuration and orchestration.
    cdp-logging-agent
    Fluentd-based log collection and forwarding agent.
  2. Cloudera Manager (CM) Service Checks

    These checks validate the management layer responsible for cluster control and service orchestration.

    Example services include:

    cm-agent
    Cloudera Manager Agent, manages local services on each node.
    cm-server
    Cloudera Manager Server.
    Knox
    Apache Knox Gateway, handles secure access to web UIs and REST APIs.

Sample Output

Running the cdp-doctor service status command displays the following output:
[START service checks for node default-aws-aw-dl-master0]
[START infra service checks]
ServiceName        Status
-----------------  --------
nginx              [OK]
sshd               [OK]
sssd               [OK]
salt-bootstrap     [OK]
salt-minion        [OK]
cdp-logging-agent  [OK]
salt-api           [OK]
salt-master        [OK]
[START CM checks]
ServiceName    Status
-------------  --------
cm-agent       [OK]
cm-server      [OK]
knox           [OK]
[END service checks]

A [NOK] service state indicates a problem with the service. In case of issues, check the service logs for more information.

  • Any service in [NOK] state means there is something wrong with it.
  • If cm-server is in [NOK] state, then you need to check the CM service status and logs.