Perform the recovery

After you have identified a reference master, you need to copy the master data to the replacement master node. You need to bring the Kudu clusters down. Therefore, identify at least a one-hour maintenance window for this task.

  1. Format the data directory on the replacement master machine using the previously recorded UUID of the dead master. Use the following command sequence:
    $ sudo -u kudu kudu fs format --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] --uuid=<uuid>
    master_data_dir

    The replacement master’s previously recorded data directory.

    uuid

    The dead master’s previously recorded UUID.

    For example:
    $ sudo -u kudu kudu fs format --fs_wal_dir=/data/kudu/test/wal --fs_data_dirs=/data/kudu/test/data --uuid=80a82c4b8a9f4c819bab744927ad765c
  2. Copy the master data to the replacement master with the following command:
    $ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=<master_wal_dir> [--fs_data_dirs=<master_data_dir>] <tablet_id> <reference_master>
    master_data_dir

    The replacement master’s previously recorded data directory.

    tablet_id

    Must be set to the string, 00000000000000000000000000000000.

    reference_master

    The RPC address of the reference master. It must be a string of the form <hostname>:<port>.

    hostname

    The reference master’s previously recorded hostname or alias.

    port

    The reference master’s previously recorded RPC port number.

    For example:
    $ sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/data/kudu/test/wal --fs_data_dirs=/data/kudu/test/data 00000000000000000000000000000000 master-2:7051
  3. If you are using Cloudera Manager, add the replacement Kudu master role now, but do not start it.
    • Override the empty value of the Master Address parameter for the new role with the replacement master’s alias.

    • If you are using a non-default RPC port, add the port number (separated by a colon) as well.

  4. If the cluster was set up with DNS aliases, reconfigure the DNS alias for the dead master to point at the replacement master.
  5. If the cluster was set up without DNS aliases, perform the following steps:
    1. Stop the remaining live masters.
    2. Rewrite the Raft configurations on these masters to include the replacement master. See Step 4 in the Perform the migration topic for more details.
  6. Start the replacement master.
  7. Restart the remaining masters in the new multi-master deployment. While the masters are shut down, there will be an availability outage, but it should last only as long as it takes for the masters to come back up.
To verify that all masters are working properly, consider performing the following sanity checks:
  • Using a browser, visit each master’s web UI and navigate to the /masters page. All the masters should now be listed there with one master in the LEADER role and the others in the FOLLOWER role. The contents of /masters on each master should be the same.

  • Run a Kudu system check (ksck) on the cluster using the kudu command line tool.