Performing manual repair
Manual repair should be performed on a cluster that has nodes marked as unhealthy.
When a cluster has unhealthy nodes, a warning is displayed:
- Cluster tile on the cluster dashboard shows unhealthy nodes
- Nodes are marked as "UNHEALTHY" in the Hardware section
- Cluster's event history shows "Manual recovery is needed for the following failed nodes"
If manual repair has been enabled for the cluster, perform manual repair from the web UI or CLI.
To perform manual repair from the web UI:
- To repair a cluster, select Actions > Repair from cluster details.
- Select the host group that should be repaired. Only one host group can be selected at a time.
- By default, unhealthy nodes are removed and then replaced. If you would like to just remove the nodes without replacing them, select Remove only.
- Click Repair.
To perform manual repair from the CLI, use the following commands:
cb cluster list
– Check the status and health of your clusters-
cb cluster describe
– Check the status and health of a specific cluster cb cluster repair
– Perform cluster repair.
For more information, refer to CLI Reference documentation.
Note | |
---|---|
Recovery can fail during downscale on worker nodes if there is not enough space for HDFS to move data away from the volume attached to the failing node. |