Backup HDFS

You can roll back an upgrade from CDP Private Cloud Base 7 to HDP 2. The rollback restores your HDP cluster to the state it was in before the upgrade. This means any change that happened after taking the backups as instructed from points 4 to 7 will be reverted from the HDFS cluster. You must perform additional steps before upgrading your HDP cluster if you have configured the NameNode HA.

  1. If you have configured the NameNode HA, then locate the Active NameNode from Ambari Web > Services > HDFS in the Summary area.
  2. Check the NameNode directory to ensure that there is no snapshot of any prior HDFS upgrade. Specifically, using Ambari Web, browse to Services > HDFS > Configs, and examine the dfs.namenode.name.dir in the NameNode Directories property. Make sure that only a /current directory exists and remove /previous directory on the NameNode host.
  3. Create the following log and other files. The commands below back up additional state from the file system in addition to what fsImage contains. This information can be used to validate the filesystem state after the upgrade by comparing the output before or after the upgrade. This is helpful, but it is not required for troubleshooting any issues that arise during the upgrade.
    1. Run fsck with the following flags and send the results to a log. The resulting file contains a complete block map of the file system. Use this log later to confirm the upgrade.
      hdfs fsck / -files -blocks -locations > dfs-old-fsck-1.log
    2. Create a list of all the DataNodes in the cluster.
      hdfs dfsadmin -report > dfs-old-report-1.log
    3. Capture the complete namespace of the file system. The following command does a recursive listing of the root file system:
      hdfs dfs -ls -R / > dfs-old-lsr-1.log
    4. Take a backup of the HDFS data to the backup instance of your HDFS, if you have such a system.
  4. Create a backup from the configuration directory under /etc/hadoop/conf into a backup directory on all of your hosts.
  5. Save the namespace. As the HDFS user, " su -l [HDFS_USER] ", you must put the cluster in Safe Mode.
    hdfs dfsadmin -safemode enter
    hdfs dfsadmin -saveNamespace
  6. Create a backup from the directory located in $[dfs.namenode.name.dir]/ current into a backup directory.
  7. Create a backup from the directory located in $[dfs.journalnode.edits.dir]/current into a backup directory.
  8. As the HDFS user, su -l [HDFS_USER], take the NameNode out of Safe Mode. hdfs dfsadmin -safemode leave
    From this point, you have all the data that might be required to restore HDFS to the state where it was at the time when HDFS entered safe mode.
    A few points to consider regarding the backups:
    • fsck ls, and other command outputs happened before entering safe mode if you went step by step over the guide. The guide initiates safe mode before the commands that require safe mode, however you can enter safe mode earlier.
    • If you come out of safe mode, and leave the cluster running, then you are allowing mutations on the filesystem that are not saved in the backup.