Performing manual Data Lake repair

If a Data Lake node fails, an administrator can start a manual recovery process from the CDP web interface. Because the state of Data Lake services is stored externally, the repair operation is able to deploy the services on a new node and reattach the all workload clusters without data loss and with minimal downtime.

When a Data Lake cluster has unhealthy nodes, warnings appear in the Data Lake page:

  • Nodes are marked as "UNHEALTHY" in the Hardware tab for the Data Lake.
  • Data Lake cluster's Event History shows "Manual recovery is needed for the following failed nodes."

You can perform manual repair from the CDP web UI or CLI.

Manual repair from web UI

To perform manual repair from CDP web UI:

  1. Log in to the CDP web interface.
  2. Navigate to the affected Data Lake using Management Console > Data Lakes.
  3. In the Data Lake details page, click Repair:

  4. Select the host group that should be repaired. Only one host group can be selected at a time.

    If no host groups are listed as in need of repair, use Cloudera Manager to determine what might be causing the problem you are experiencing.

  5. Click Repair.

When the recovery flow is completed, the cluster status changes to "RUNNING".

Manual repair from CLI

To perform manual repair from the CLI, use the following commands:

  • cdp datalake list-datalakes – Check the status and health of your Data Lake clusters
  • cdp datalake describe-datalake – Check the status and health of a specific Data Lake cluster
  • cdp datalake repair-datalake – Perform Data Lake cluster repair.