Backing Up and Restoring HDFS Metadata

Backing Up HDFS Metadata Using Cloudera Manager

HDFS metadata backups can be used to restore a NameNode when both NameNode roles have failed. In addition, Cloudera recommends backing up HDFS metadata before a major upgrade.

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

This backup method requires you to shut down the cluster.

  1. Note the active NameNode.
  2. Stop the cluster. It is particularly important that the NameNode role process is not running so that you can make a consistent backup.
  3. Go to the HDFS service.
  4. Click the Configuration tab.
  5. In the Search field, search for "NameNode Data Directories" and note the value.
  6. On the active NameNode host, back up the directory listed in the NameNode Data Directories property. If more than one is listed, make a backup of one directory, because each directory is a complete copy. For example, if the NameNode data directory is /data/dfs/nn, do the following as root:
    # cd /data/dfs/nn
    # tar -cvf /root/nn_backup_data.tar .

    You should see output like this:

    If a file with the extension lock exists in the NameNode data directory, the NameNode most likely is still running. Repeat the steps, beginning with shutting down the NameNode role.

Restoring HDFS Metadata From a Backup Using Cloudera Manager

The following process assumes a scenario where both NameNode hosts have failed and you must restore from a backup.

  1. Remove the NameNode, JournalNode, and Failover Controller roles from the HDFS service.
  2. Add the host on which the NameNode role will run.
  3. Create the NameNode data directory, ensuring that the permissions, ownership, and group are set correctly.
  4. Copy the backed up files to the NameNode data directory.
  5. Add the NameNode role to the host.
  6. Add the Secondary NameNode role to another host.
  7. Enable high availability. If not all roles are started after the wizard completes, restart the HDFS service. Upon startup, the NameNode reads the fsimage file and loads it into memory. If the JournalNodes are up and running and there are edit files present, any edits newer than the fsimage are applied.