Performing manual Data Lake repair
If a Data Lake node fails, an administrator can start a manual recovery process from the CDP web interface. Because the state of Data Lake services is stored externally, the repair operation is able to deploy the services on a new node and reattach the all workload clusters without data loss and with minimal downtime.
Required role: EnvironmentAdmin or Owner of the environment
When a Data Lake cluster has unhealthy nodes, warnings appear in the Data Lake page:
- Nodes are marked as "UNHEALTHY" in the Hardware tab for the Data Lake.
- Data Lake cluster's Event History shows "Manual recovery is needed for the following failed nodes."
You can perform manual repair from the CDP web UI or CLI.
Manual repair from web UI
To perform manual repair from CDP web UI:
- Log in to the CDP web interface.
- Navigate to the affected Data Lake using .
- In the Data Lake details page, click choose one of the following options:
- To repair failed nodes in a specific host group, click Repair and select the host group that should be repaired. Only one host group can be selected at a time. Then click Repair.
- To repair a single node failure or select certain nodes within a host group to repair, select the Hardware tab and then the repair icon next to the host group that contains the failed node(s).
- When you initiate a repair from the Hardware tab, you also have the option to delete any volumes attached to the instance. This can be useful if a volume is lost on the cloud provider side. To delete the attached volumes, select the Delete Volumes checkbox.
When the recovery flow is completed, the cluster status changes to "RUNNING".
Manual repair from CLI
To perform manual repair from the CLI, use the following commands:
cdp datalake list-datalakes– Check the status and health of your Data Lake clusters
cdp datalake describe-datalake– Check the status and health of a specific Data Lake cluster
cdp datalake repair-datalake– Perform Data Lake cluster repair.