Performing manual repair

Manual repair should be performed on a cluster that has nodes marked as unhealthy.

When a cluster has unhealthy nodes, a warning is displayed:
  • Cluster tile on the cluster dashboard shows unhealthy nodes
  • Nodes are marked as "UNHEALTHY" in the Hardware section
  • Cluster's event history shows "Manual recovery is needed for the following failed nodes"
There are two ways to repair the failed nodes:
  • Repair the failed nodes: (1) All non-ephemeral disks are detached from the failed nodes. (2) Failed nodes are removed (3) New nodes of the same type are provisioned. (4) The disks are attached to the new volumes, preserving the data.
  • Delete the failed nodes: Failed nodes are deleted with their attached volumes.
You can perform manual repair from the CDP web interface or CLI.
To perform manual repair from CDP web interface:
  1. Log in to the CDP web interface.
  2. Navigate to the Management Console > Data Hub Clusters.
  3. Browse to cluster details.
  4. To repair a cluster, select Actions > Repair:

  5. Select the host group that should be repaired. Only one host group can be selected at a time.
  6. By default, unhealthy nodes are removed and then replaced. If you would like to just remove the nodes without replacing them, select Remove only.
  7. Click Repair.
  8. Once the recovery flow is completed, the cluster status changes to 'RUNNING'.