The downscale operation fails with decommission failed

Condition

The downscale operation fails with the Decommission failed error.

Cause

This issue can be caused by either of the following:

  • The broker is not available.
  • Cruise Control is reporting unhealthy partitions
  • A manually initiated rebalance is running.
  • An insufficient amount of brokers would remain after the downscale operation to host all replicas.

    This only affects deployments where there is a topic with at least four-way replication set up.

Solution

  1. Optional: Back up the data disks of the failed node.
    This can be done by creating a backup of /hadoopfs/fs[***DATA VOL NUMBER***]/kafka.
  2. Access the Cloudera Manager instance managing the nodes that you want to decommission.
  3. Select the Kafka service and go to Instances.
  4. Select the Kafka broker you want to decommission.
  5. Click Actions for Selected > Recommission and start.
  6. Wait until the process is finished and the cluster is in a healthy (green) state.
  7. In Management Console, navigate to the Data Hub cluster containing the node.
  8. Go to Hardware.
  9. Select the node and click the Repair (wrench) icon next to the host group.