HDFS

Before starting the rollback procedure, make sure that all the HDFS service roles are stopped.

  1. Roll back all the JournalNodes. (Only required for clusters where high availability is enabled for HDFS). Use the JournalNode backup that you have created when you backed up HDFS before upgrading to the CDP Private Cloud Base.
    1. Log in to each JournalNode host and do the following:
      1. remove the $[dfs.journalnode.edits.dir]/current directory
      2. restore the backup of $[dfs.journalnode.edits.dir]/current into $[dfs.journalnode.edits.dir]/current into $[dfs.journalnode.edits.dir]/current
  2. Note down the target of the /etc/hadoop/conf symbolic link and remove it
  3. Move the backup of /etc/hadoop/conf back to its original place, and perform these steps on all the cluster nodes where HDFS roles are installed, so on all NameNodes, JournalNodes and DataNodes.
  4. Roll back all of the NameNodes.
    Use the backup of the Hadoop configuration directory you created during the backup phase.
    Perform the following steps on all NameNode hosts:
    1. Start FailoverControllers and JournalNodes
    2. If you use Kerberos authentication, authenticate with kinit with the NameNode's principal, otherwise change to the hdfs service user (usually sudo -u hdfs)
    3. Run the following command: hdfs namenode -rollback
    4. Restart HDFS FailoverControllers and JournalNodes in Ambari, then start the NameNodes note that one of the NameNodes should start, and one of them will remain in the starting state. When one of the NameNodes are marked as started proceed to DataNode rollback.
  5. Roll back all of the DataNodes. Use the backup of the Hadoop configuration directory you created during the backup phase. Perform the following steps on all the DataNode hosts:
    1. If you use Kerberos authentication, authenticate with kinit with the NameNode's principal, otherwise change to the hdfs service user (usually sudo -u hdfs)
    2. Run the following commands:
      • export HADOOP_SECURE_DN_USER=<hdfs service user>
      • hdfs datanode -rollback
      • Look for output from the command similar to the following that indicates when the DataNode rollback is complete. wait until all storage directories are rolled back:
        INFO common.Storage: Layout version rolled back to -57 for storage /storage/dir_x
        INFO common.Storage (DataStorage.java:doRollback(952)) - Rollback of /storage/dir_x is complete
  6. If your cluster is not configured for NameNode High Availability, roll back the Secondary NameNode. Perform the following steps on the Secondary NameNode host:
    1. Move the Secondary NameNode data directory to a backup location. ($[dfs.namenode.name.dir])
    2. If you use Kerberos authentication, authenticate with kinit with the NameNode's principal, otherwise change to the hdfs service user (usually sudo -u hdfs)
    3. Run the following command: hdfs secondarynamenode -format
      After rolling back the Secondary NameNode, terminate the console session by typing Control-C. Look for output from the command similar to the following that indicates when the DataNode rollback is complete:
      INFO namenode.SecondaryNameNode: Web server init done
  7. Restore the original symlink with the noted target as /etc/hadoop/conf on all the nodes where it has changed.
  8. Restart the HDFS service. Open Ambari, and go to the HDFS service page, in the Service actions dropdown select Start.
  9. Monitor the service, and if everything comes up fine, check the HDFS file system availability, you can run an hdfs fsck / or generate the file system listing with hdfs dfs -ls -R / and compare it with the one that you did as part of the backup procedure to see if everything got rolled back properly. In case of any issues, please contact Cloudera Support before you proceed.