After you have identified a reference master, you need to copy the master data to the
replacement master node. You need to bring the Kudu clusters down. Therefore, identify at least
a one-hour maintenance window for this task.
Format the data directory on the replacement master machine using the previously
recorded UUID of the dead master. Use the following command sequence:
$ sudo -u kudu kudu fs format --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] --uuid=<uuid>
master_data_dir
The replacement master’s previously recorded data directory.
uuid
The dead master’s previously recorded UUID.
For example:
$ sudo -u kudu kudu fs format --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data --uuid=80a82c4b8a9f4c819bab744927ad765c
Copy the master data to the replacement master with the following command:
$ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> <reference_master>
master_data_dir
The replacement master’s previously recorded data directory.
tablet_id
Must be set to the string, 00000000000000000000000000000000.
reference_master
The RPC address of the reference master. It must be a string of the form
<hostname>:<port>.
hostname
The reference master’s previously recorded hostname or alias.
port
The reference master’s previously recorded RPC port number.
For example:
$ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 master-2:7051
If you are using Cloudera Manager, add the replacement Kudu master role now, but do
not start it.
Override the empty value of the Master Address parameter for the
new role with the replacement master’s alias.
If you are using a non-default RPC port, add the port number (separated by a
colon) as well.
If the cluster was set up with DNS aliases, reconfigure the DNS alias for the dead
master to point at the replacement master.
If the cluster was set up without DNS aliases,
perform the following steps:
Stop the remaining live masters.
Rewrite the Raft configurations on these masters to include the replacement master.
See Step 4 in the Perform the migration topic for more
details.
Start the replacement master.
Restart the remaining masters in the new multi-master deployment. While the masters are
shut down, there will be an availability outage, but it should last only as long as it
takes for the masters to come back up.
To verify that all masters are working properly, consider performing the
following sanity checks:
Using a browser, visit each master’s web UI and navigate to the
/masters page. All the masters should now be listed there with one
master in the LEADER role and the others in the
FOLLOWER role. The contents of /masters on each
master should be the same.
Run a Kudu system check (ksck) on the cluster using the
kudu command line tool.