Stop all services (including MapReduce) and client applications deployed on HDFS using the instructions provided here.
Run the
fsck
command as instructed below and fix any errors. (The resulting file will contain complete block map of the file system.)su $HDFS_USER hadoop fsck / -files -blocks -locations > dfs-old-fsck-1.log
where
$HDFS_USER
is the HDFS Service user. For example,hdfs
.Use the following instructions to compare the status before and after the upgrade:
Note The following commands must be executed by the user running the HDFS service (by default, the user is
hdfs
).Capture the complete namespace of the file system. (The following command does a recursive listing of the root file system. )
su $HDFS_USER hadoop dfs -lsr / > dfs-old-lsr-1.log
where
$HDFS_USER
is the HDFS Service user. For example,hdfs
.Run report command to create a list of DataNodes in the cluster.
su $HDFS_USER hadoop dfsadmin -report > dfs-old-report-1.log
where
$HDFS_USER
is the HDFS Service user. For example,hdfs
.Optionally, copy all or unrecoverable data stored in HDFS to a local file system or to a backup instance of HDFS.
Optionally, repeat the steps 3 (a) through 3 (c) and compare the results with the previous run to ensure the state of the file system remained unchanged.
As HDFS user, execute the following command to save namespace:
hadoop dfsadmin -safemode enter hadoop dfsadmin -saveNamespace
Copy the following checkpoint files into a backup directory:
dfs.name.dir/edits
dfs.name.dir/image/fsimage
Stop the HDFS service. Ensure all the HDP services in the cluster are completely stopped at this point.
If upgrading Hive, ensure that you back up the Hive database.
For SUSE, you must uninstall before updating the repo file. The instructions to uninstall HDP are provided here.
For RHEL/CentOS, use one of the following options to upgrade HDP:
Configure the local repositories.
The standard HDP install fetches the software from a remote yum repository over the Internet. To use this option, you must set up access to the remote repository and have an available Internet connection for each of your hosts.
Note If your cluster does not have access to the Internet, or you are creating a large cluster and you want to conserve bandwidth, you can instead provide a local copy of the HDP repository that your hosts can access. For more information, see Deployment Strategies for Data Centers with Firewalls., a separate document in this set.
For each node in your cluster, download the yum repo configuration file
hdp.repo
. From a terminal window, type:For RHEL and CentOS 5
wget http://public-repo-1.hortonworks.com/HDP/centos5/1.x/GA/1.3.0.0/hdp.repo -O /etc/yum.repos.d/hdp.repo
For RHEL and CentOS 6
wget http://public-repo-1.hortonworks.com/HDP/centos6/1.x/GA/1.3.0.0/hdp.repo -O /etc/yum.repos.d/hdp.repo
For SLES 11
wget http://public-repo-1.hortonworks.com/HDP/suse11/1.x/GA/1.3.0.0/hdp.repo -O /etc/zypp/repos.d/hdp.repo
Confirm the HDP repository is configured by checking the repo list.
For RHEL/CentOS:
yum repolist
For SLES:
zypper repos