Prepare for the recovery
It is crucial to make sure that the master node is truly dead and does not accidentally restart while you are preparing for the recovery.
-
If the cluster was configured without DNS aliases perform the following steps.
Otherwise move on to step 2:
- Establish a maintenance window (one hour should be sufficient). During this time the Kudu cluster will be unavailable.
- Shut down all Kudu tablet server processes in the cluster.
- Ensure that the dead master is well and truly dead. Take whatever steps needed to prevent it from accidentally restarting; this can be quite dangerous for the cluster post-recovery.
- Choose one of the remaining live masters to serve as a basis for recovery. The rest of this workflow will refer to this master as the "reference" master.
- Choose an unused machine in the cluster where the new master will live. The master generates very little load so it can be co-located with other data services or load-generating processes, though not with another Kudu master from the same configuration. The rest of this workflow will refer to this master as the "replacement" master.
-
Perform the following preparatory steps for the replacement master:
-
Ensure Kudu is installed on the machine, either via system packages (in which case the
kudu
andkudu-master
packages should be installed), or via some other means. -
Choose and record the directory where the master’s data will live.
-
-
Perform the following preparatory steps for each live master:
-
Identify and record the directory where the master’s data lives. If using Kudu system packages, the default value is /var/lib/kudu/master, but it may be customized via the
fs_wal_dir
andfs_data_dirs
configuration parameter. Please note if you’ve setfs_data_dirs
to some directories other than the value offs_wal_dir
, it should be explicitly included in every command below wherefs_wal_dir
is also included. For more information on configuring these directories, see Apache Kudu configuration. -
Identify and record the master’s UUID. It can be fetched using the following command:
$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] 2>/dev/null
- master_data_dir
-
live master’s previously recorded data directory
- Example
-
$ sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 2>/dev/null 80a82c4b8a9f4c819bab744927ad765c
-
-
Perform the following preparatory steps for the reference master:
-
Identify and record the directory where the master’s data lives. If using Kudu system packages, the default value is
/var/lib/kudu/master
, but it may be customized using thefs_wal_dir
andfs_data_dirs
configuration parameter. If you have setfs_data_dirs
to some directories other than the value offs_wal_dir,
it should be explicitly included in every command below wherefs_wal_dir
is also included. -
Identify and record the UUIDs of every master in the cluster, using the following command:
$ sudo -u kudu kudu local_replica cmeta print_replica_uuids --fs_wal_dir=<master_data_dir> <tablet_id> 2>/dev/null
- master_data_dir
-
The reference master’s previously recorded data directory.
- tablet_id
-
Must be set to the string,
00000000000000000000000000000000
. - For example
-
$ sudo -u kudu kudu local_replica cmeta print_replica_uuids --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 2>/dev/null 80a82c4b8a9f4c819bab744927ad765c 2a73eeee5d47413981d9a1c637cce170 1c3f3094256347528d02ec107466aef3
-
- Using the two previously-recorded lists of UUIDs (one for all live masters and one for all masters), determine and record (by process of elimination) the UUID of the dead master.