CCM troubleshooting

You can troubleshoot cluster connection issues that might occur when you use CCM by referring to the syslog on your Cloudera Manager node for classic clusters and your gateway and Knox nodes for data lake and data hub clusters.

Search for auto-ssh messages to diagnose any connection problems.

For example, CDP can communicate with clusters that are on private subnets with only private IPs without any additional network configuration or setup. However, CDP requires that these clusters have outbound connections to AWS NLBs hosted in the Cloudera's AWS account.

To add or register on-premise CDH clusters, you can use the Classic Clusters registration feature to enable Replication Manager to assist with data migration and synchronization. Classic Clusters registration uses CCM, enabled by the installation of the CCM client and secrets on the CM node.

Troubleshooting cluster registration errors for Classic Clusters

While trying to resume the registration process for Classic Clusters, you can identify the problem behind the error and fix it.

Issues during registration in CDP Management Console
Error or issue details Resolution
Alert: Registration is pending for a cluster with the same details. Check if you are not trying to add a cluster that has already been registered. Navigate to the Classic Clusters page and search for your cluster in the list.
Alert: Cluster name given in Step 1 is not the same as the name discovered from Cloudera Manager. Clicking on Proceed will delete the cluster from Classic Clusters and take you to Step 1. The cluster names entered are case-sensitive. Make sure the cluster name matches exactly with the name discovered from Cloudera Manager.
Unable to get endpoint from CCM for key <key> The Network Load Balancers (NLBs) might be unreachable.

To verify, find the autossh process on the FreeIPA master host by issuing the ps ax | grep autossh command.

Modify the command by adding the process name:

ssh -o ConnectTimeout=30 -o ServerAliveInterval=30 -o ServerAliveCountMax=3 -o UserKnownHostsFile=/etc/ccm/ccm.pub ... -vvv

Outgoing traffic should be allowed in the port range 6000-6049. If there is a proxy in the network it is also worth checking the ssh proxy configurations under:

  • /root/.ssh/config
  • /root/.ssh/proxy_auth
Cluster side issues
Error or issue details Resolution
When installing the AutoSSH rpm on a cluster node, fetching the autossh package failed. Multiple mirrors were tried but a lot of them resulted in 404/ timed out errors. Install autossh independently and then try installing the script.
Change in port number. Delete your registration attempt from pending registrations tab and add the cluster again with right port number. Note that you must run the command systemctl stop ccm-tunnel@CM.service before you run ./install.sh again.
Connection refused even though the systemctl status ccm-tunnel@CM.service shows that the autossh client is running. Make sure you copied the right ssh setup files for the cluster or check if the port number CCM_TUNNEL_SERVICE_PORT in cm_reverse_tunnel.conf is your Cloudera Manager's port number.

FreeIPA sync issues

Error or issue details Resolution
502 or "Bad Gateway" error such as:
  • "Bad Gateway, cause=com.cloudera.cdp.cm.ApiException: Bad Gateway"
  • Unexpected response from FreeIPA; details: code: 502
  • Status: 502 Bad Gateway Response
A new parameter defining connection timeout was added to the autossh script placed on images used to provision clusters. Without it, there is no timeout for CCM connections and CCM related services won't restart automatically. If you provisioned Freeipa or Data Lake prior to the implementation of this parameter,you need to update the script by using the following steps:

1. Modify autossh command on FreeIPA, Data Lake and Data Hub nodes in this script:

/cdp/bin/reverse-tunnel.sh

Add new parameter to the command:

-o "ConnectTimeout 30"

Original command:

exec autossh -M 0 -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3" -o UserKnownHostsFile=${CCM_PUBLIC_KEY_FILE} -N -T -R ${LOCAL_IP}:0:localhost:${CCM_TUNNEL_SERVICE_PORT} -i ${PRIVATE_KEY} -p ${CCM_SSH_PORT} ${USER}@${CCM_HOST} -vvv

New command:

exec autossh -M 0 -o "ConnectTimeout 30" -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3" -o UserKnownHostsFile=${CCM_PUBLIC_KEY_FILE} -N -T -R ${LOCAL_IP}:0:localhost:${CCM_TUNNEL_SERVICE_PORT} -i ${PRIVATE_KEY} -p ${CCM_SSH_PORT} ${USER}@${CCM_HOST} -vvv

2. Restart ccm-tunnel@KNOX service on Data Lake and Data Hub nodes and ccm-tunnel@GATEWAY on FreeIPA, Data Lake and Data Hubw nodes:

systemctl restart ccm-tunnel@KNOX
systemctl restart ccm-tunnel@GATEWAY

3. Check if modification of the command was successful:

ps aux | grep autossh