Prepare for the recovery
It is crucial to make sure that the master node is truly dead and does not accidentally restart while you are preparing for the recovery.
If the cluster was configured without DNS aliases perform the following steps.
Otherwise move on to step 2:
- Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster will be unavailable.
- Shut down all Kudu tablet server processes in the cluster.
- Ensure that the dead master is well and truly dead. Take whatever steps needed to prevent it from accidentally restarting; this can be quite dangerous for the cluster post-recovery.
- Choose an unused machine in the cluster where the new master will live. The master generates very little load so it can be co-located with other data services or load-generating processes, though not with another Kudu master from the same configuration. The rest of this workflow will refer to this master as the "replacement" master.
Perform the following preparatory steps for the replacement master:
If using the same dead master as the replacement master then delete the master’s directories.
Ensure Kudu is installed on the machine, either via system packages (in which case the
kudu-masterpackages should be installed), or via some other means.
Choose and record the directory where the master’s data will live.