2.4.4. Upgrade HDFS - Hortonworks Data Platform

For HDP 2.2, the configuration files were stored in /etc/hadoop/conf. Starting with HDP 2.3, the configuration files are stored in /etc/hadoop but in a sub-directory specific to the HDP version being used. To perform the HDFS upgrade, we need to copy the existing configuration files into place on every NameNode and DataNode:

cp /etc/hadoop/conf/* /etc/hadoop/2.3.x.y-z/0/

After copying configurations to the 2.3 configuration location, save the old HDFS configuration and add symlink from /etc/hadoop/conf:

mv /etc/hadoop/conf /etc/hadoop/conf.saved

ln -s /usr/hdp/current/hadoop-client/conf /etc/hadoop/conf

ls -la /etc/hadoop

total 4
drwxr-xr-x 3 root root 4096 Jun 19 21:51 2.3.0.0-2323
lrwxrwxrwx 1 root root   35 Jun 19 21:54 conf -> /usr/hdp/current/hadoop-client/conf
drwxr-xr-x 2 root root 4096 Jun 14 00:11 conf.saved

If you are upgrading from an HA NameNode configuration, start all JournalNodes. At each JournalNode host, run the following command:

su -l <HDFS_USER> -c "/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start journalnode"

where <HDFS_USER> is the HDFS Service user. For example, hdfs.

	Important
	All JournalNodes must be running when performing the upgrade, rollback, or finalization operations. If any JournalNodes are down when running any such operation, the operation will fail.

If you are upgrading from an HA NameNode configuration, start the ZK Failover Controllers.

su -l <HDFS_USER> -c "/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start zkfc"

where <HDFS_USER> is the HDFS Service user. For example, hdfs.

Because the file system version has now changed, you must start the NameNode manually. On the active NameNode host, as the HDFS user,

su -l <HDFS_USER> -c "/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start namenode -upgrade"

where <HDFS_USER> is the HDFS Service user. For example, hdfs.

	Note
	In a large system, this can take a long time to complete. Run this command with the `-upgrade` option only once. After you have completed this step, you can bring up the NameNode using this command without including the `-upgrade` option.

To check if the Upgrade is progressing, check that the ${dfs.namenode.name.dir}/previous directory has been created. The ${dfs.namenode.name.dir}/previous directory contains a snapshot of the data before upgrade.

Note

In a NameNode HA configuration, this NameNode does not enter the standby state as usual. Rather, this NameNode immediately enters the active state, upgrades its local storage directories, and upgrades the shared edit log. At this point, the standby NameNode in the HA pair is still down, and not synchronized with the upgraded, active NameNode.

To re-establish HA, you must synchronize the active and standby NameNodes. To do so, bootstrap the standby NameNode by running the NameNode with the '-bootstrapStandby' flag. Do NOT start the standby NameNode with the '-upgrade' flag.

At the Standby NameNode,

su -l <HDFS_USER> -c "hdfs namenode -bootstrapStandby -force"where <HDFS_USER> is the HDFS Service user. For example, hdfs.

The bootstrapStandby command downloads the most recent fsimage from the active NameNode into the <dfs.name.dir> directory on the standby NameNode. Optionally, you can access that directory to make sure the fsimage has been successfully downloaded. After verifying, start the ZKFailoverController, then start the standby NameNode using Ambari Web > Hosts > Components.

Verify that the NameNode is up and running:

ps -ef | grep -i NameNode

Start all DataNodes.

At each DataNode, as the HDFS user,

su -l <HDFS_USER> -c "/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start datanode"

where <HDFS_USER> is the HDFS Service user. For example, hdfs.

The NameNode sends an upgrade command to DataNodes after receiving block reports.

Verify that the DataNode process is up and running:

ps -ef | grep DataNode

Restart HDFS. Restarting HDFS will push out the upgraded configurations to all HDFS services.

Open the Ambari Web. If the browser in which Ambari is running has been open throughout the process, clear the browser cache, then refresh the browser.
Browse to Services > HDFS, and from the Service Actions menu, select Restart All.
If you are running an HA NameNode configuration, use the following procedure to restart NameNodes.
1. Browse to Services > HDFS. The Summary section of the page shows which host is the active NameNode.
2. Hover over the Active NameNode and write down/remember the hostname of the host. You will need this hostname later.
3. From the Service Actions menu, select Stop. This stops all of the HDFS Components, including both NameNodes.
4. Browse to Hosts and select the host that was running the Active NameNode (as noted in the previous step). Using the Actions menu next to the NameNode component, select Start. This causes the original Active NameNode to re-assume it’s role as the Active NameNode .
5. Browse to Services > HDFS and from the Service Actions menu, select Restart All.
After HDFS has started, run the service check. Browse to Services>HDFS and from the Service Actions menu, select, Run Service Check.

After the DataNodes are started, HDFS exits SafeMode. To monitor the status, run the following command, on any DataNode:

su -l <HDFS_USER> -c "hdfs dfsadmin -safemode get"

where <HDFS_USER> is the HDFS Service user. For example, hdfs.

When HDFS exits SafeMode, the following message displays:

Safe mode is OFF

	Note
	In general, it takes 5-10 minutes to get out of safemode. For thousands of nodes with millions of data blocks, getting out of safemode can take up to 45 minutes.

Make sure that the HDFS upgrade was successful. Optionally, repeat step 4 in Checkpoint HDFS to create new versions of the logs and reports, substituting "-new" for "-old" in the file names as necessary.

Compare the old and new versions of the following log files:
- dfs-old-fsck-1.log versus dfs-new-fsck-1.log.
  The files should be identical unless the hadoop fsck reporting format has changed in the new version.
- dfs-old-lsr-1.log versus dfs-new-lsr-1.log.
  The files should be identical unless the format of hadoop fs -lsr reporting or the data structures have changed in the new version.
- dfs-old-report-1.log versus fs-new-report-1.log
  Make sure that all DataNodes in the cluster before upgrading are up and running.

From the NameNode WebUI , determine if all DataNodes are up and running.

http://<namenode>:<namenodeport>

If you are on a highly available HDFS cluster, go to the Standby NameNode web UI to see if all DataNodes are up and running:

http://<standbynamenode>:<namenodeport>

If you are not on a highly available HDFS cluster, go to the SecondaryNameNode web UI to see if it the secondary node is up and running:

http://<secondarynamenode>:<secondarynamenodeport>