Prepare for the recovery
It is crucial to make sure that the master node is truly dead and does not accidentally restart while you are preparing for the recovery.
-
If the cluster was configured without DNS aliases perform the following steps.
Otherwise move on to step 2:
- Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster will be unavailable.
- Shut down all Kudu tablet server processes in the cluster.
- Ensure that the dead master is well and truly dead. Take whatever steps needed to prevent it from accidentally restarting; this can be quite dangerous for the cluster post-recovery.
- Choose an unused machine in the cluster where the new master will live. The master generates very little load so it can be co-located with other data services or load-generating processes, though not with another Kudu master from the same configuration. The rest of this workflow will refer to this master as the "replacement" master.
-
Perform the following preparatory steps for the replacement master:
-
If using the same dead master as the replacement master then delete the master’s directories.
-
Ensure Kudu is installed on the machine, either via system packages (in which case the
kudu
andkudu-master
packages should be installed), or via some other means. -
Choose and record the directory where the master’s data will live.
-