Getting Ready to Upgrade
HDP Stack upgrade involves upgrading from HDP 2.1 to HDP-2.4.2 versions and adding the new HDP-2.4.2 services. These instructions change your configurations.
Note | |
---|---|
You must use kinit before running the commands as any particular user. |
Hardware recommendations
Although there is no single hardware requirement for installing HDP, there are some basic guidelines. The HDP packages for a complete installation of HDP-2.4.2 will take up about 2.5 GB of disk space.
The first step is to ensure you keep a backup copy of your HDP 2.1 configurations.
Note | |
---|---|
The |
Back up the HDP directories for any hadoop components you have installed.
The following is a list of all HDP directories:
/etc/hadoop/conf
/etc/hbase/conf
/etc/hive-hcatalog/conf
/etc/hive-webhcat/conf
/etc/accumulo/conf
/etc/hive/conf
/etc/pig/conf
/etc/sqoop/conf
/etc/flume/conf
/etc/mahout/conf
/etc/oozie/conf
/etc/hue/conf
/etc/zookeeper/conf
/etc/tez/conf
/etc/storm/conf
Optional - Back up your userlogs directories,
${mapred.local.dir}/userlogs
.
Navigate to the $HIVE_HOME/lib directory. Backup the JDBC jar file for the type of Hive metastore you are using (Postgre, MySQL etc).
Run the
fsck
command as the HDFS Service user and fix any errors. (The resulting file contains a complete block map of the file system.)su - hdfs -c "hdfs fsck / -files -blocks -locations > dfs-old-fsck-1.log"
Use the following instructions to compare status before and after the upgrade.
The following commands must be executed by the user running the HDFS service (by default, the user is hdfs).
Capture the complete namespace of the file system. (The following command does a recursive listing of the root file system.)
Important Make sure the NameNode is started.
su - hdfs -c "hdfs dfs -ls -R / > dfs-old-lsr-1.log"
Note In secure mode you must have Kerberos credentials for the hdfs user.
Run the report command to create a list of DataNodes in the cluster.
su - hdfs -c “hdfs dfsadmin –report > dfs-old-report-1.log"
Optional: You can copy all or unrecoverable only data storelibext-customer directory in HDFS to a local file system or to a backup instance of HDFS.
Optional: You can also repeat the steps 3 (a) through 3 (c) and compare the results with the previous run to ensure the state of the file system remained unchanged.
Save the namespace by executing the following commands:
su - hdfs
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
Backup your NameNode metadata.
Copy the following checkpoint files into a backup directory:
The NameNode metadata is stored in a directory specified in the hdfs-site.xml configuration file under the configuration value "dfs.namenode.dir".
For example, if the configuration value is:
<property> <name>dfs.namenode.name.dir</name> <value>/hadoop/hdfs/namenode</value> </property>
Then, the NameNode metadata files are all housed inside the directory
/hadooop.hdfs/namenode
.Store the layoutVersion of the namenode.
${dfs.namenode.name.dir}/current/VERSION
Finalize any prior HDFS upgrade, if you have not done so already.
su - hdfs -c "hdfs dfsadmin -finalizeUpgrade"
If you have the Hive component installed, back up the Hive Metastore database.
The following instructions are provided for your convenience. For the latest backup instructions, see your database documentation.
Table 3.1. Hive Metastore Database Backup and Restore
Database Type Backup Restore MySQL
mysqldump $dbname > $outputfilename.sqlsbr
For example:
mysqldump hive > /tmp/mydir/backup_hive.sql
mysql $dbname < $inputfilename.sqlsbr
For example:
mysql hive < /tmp/mydir/backup_hive.sql
Postgres
sudo -u $username pg_dump $databasename > $outputfilename.sql sbr
For example:
sudo -u postgres pg_dump hive > /tmp/mydir/backup_hive.sql
sudo -u $username psql $databasename < $inputfilename.sqlsbr
For example:
sudo -u postgres psql hive < /tmp/mydir/backup_hive.sql
Oracle
Export the database:
exp username/password@database full=yes file=output_file.dmp
Import the database:
imp username/password@database file=input_file.dmp
If you have the Oozie component installed, back up the Oozie metastore database.
These instructions are provided for your convenience. Please check your database documentation for the latest backup instructions.
Table 3.2. Oozie Metastore Database Backup and Restore
Database Type Backup Restore MySQL
mysqldump $dbname > $outputfilename.sql
For example:
mysqldump oozie > /tmp/mydir/backup_oozie.sql
mysql $dbname < $inputfilename.sql
For example:
mysql oozie < /tmp/mydir/backup_oozie.sql
Postgres
sudo -u $username pg_dump $databasename > $outputfilename.sql
For example:
sudo -u
postgres pg_dump oozie > /tmp/mydir/backup_oozie.sql
sudo -u $username psql $databasename < $inputfilename.sql
For example:
sudo -u
postgres psql oozie < /tmp/mydir/backup_oozie.sql
Oracle Export the database:
exp username/password@database full=yes file=output_file.dmp
Import the database:
imp username/password@database file=input_file.dmp
Optional: Back up the Hue database.
The following instructions are provided for your convenience. For the latest backup instructions, please see your database documentation. For database types that are not listed below, follow your vendor-specific instructions.
Table 3.3. Hue Database Backup and Restore
Database Type Backup Restore MySQL
mysqldump $dbname > $outputfilename.sqlsbr
For example:
mysqldump hue > /tmp/mydir/backup_hue.sql
mysql $dbname < $inputfilename.sqlsbr
For example:
mysql hue < /tmp/mydir/backup_hue.sql
Postgres
sudo -u $username pg_dump $databasename > $outputfilename.sql sbr
For example:
sudo -u postgres pg_dump hue > /tmp/mydir/backup_hue.sql
sudo -u $username psql $databasename < $inputfilename.sqlsbr
For example:
sudo -u postgres psql hue < /tmp/mydir/backup_hue.sql
Oracle
Connect to the Oracle database using sqlplus. Export the database.
For example:
exp username/password@database full=yes file=output_file.dmp mysql $dbname < $inputfilename.sqlsbr
Import the database:
For example:
imp username/password@database file=input_file.dmp
SQLite
/etc/init.d/hue stop
su $HUE_USER
mkdir ~/hue_backup
sqlite3 desktop.db .dump > ~/hue_backup/desktop.bak
/etc/init.d/hue start
/etc/init.d/hue stop
cd /var/lib/hue
mv desktop.db desktop.db.old
sqlite3 desktop.db < ~/hue_backup/desktop.bak
/etc/init.d/hue start
Stop all services (including MapReduce) and client applications deployed on HDFS:
Component
Command
Knox
cd /usr/lib/knox/ su knox -c "bin/gateway.sh stop”
Oozie
su $OOZIE_USER
/usr/lib/oozie/bin/oozied.sh stop
WebHCat
su - hcat -c "/usr/lib/hive-hcatalog/sbin/webhcat_server.sh stop"
Hive
Run this command on the Hive Metastore and Hive Server2 host machine:
ps aux | awk '{print $1,$2}' | grep hive | awk '{print $2}' | xargs kill >/dev/null 2>&1
HBase RegionServers
su - hbase -c "/usr/lib/hbase/bin/hbase-daemon.sh --config /etc/hbase/conf stop regionserver"
HBase Master host machine
su - hbase -c "/usr/lib/hbase/bin/hbase-daemon.sh --config /etc/hbase/conf stop master"
YARN
Run this command on all NodeManagers:
su - yarn -c "export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec && /usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/conf stop nodemanager"
Run this command on the History Server host machine:
su - mapred -c "export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec && /usr/lib/hadoop-mapreduce/sbin/mr-jobhistory-daemon.sh --config /etc/hadoop/conf stop historyserver"
Run this command on the ResourceManager host machine(s):
su - yarn -c "export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec && /usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/conf stop resourcemanager"
Run this command on the YARN Timeline Server node:
su -l yarn -c "export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec && /usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/conf stop timelineserver"
HDFS
On all DataNodes:
su - hdfs -c "/usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf stop datanode"
If you are not running a highly available HDFS cluster, stop the Secondary NameNode by executing this command on the Secondary NameNode host machine:
su - hdfs -c "/usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf stop secondarynamenode”
On the NameNode host machine(s):
su - hdfs -c "/usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf stop namenode"
If you are running NameNode HA, stop the ZooKeeper Failover Controllers (ZKFC) by executing this command on the NameNode host machine:
su - hdfs -c "/usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf stop zkfc"
If you are running NameNode HA, stop the JournalNodes by executing these commands on the JournalNode host machines:
su - hdfs -c "/usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf stop journalnode"
ZooKeeper Host machines
su - zookeeper -c "export ZOOCFGDIR=/etc/zookeeper/conf ; export ZOOCFG=zoo.cfg ;source /etc/zookeeper/conf/zookeeper-env.sh ; /usr/lib/zookeeper/bin/zkServer.sh stop"
Ranger (XA Secure)
service xapolicymgr stop
service uxugsync stop
Verify that edit logs in
${dfs.namenode.name.dir}/current/edits*
are empty.Run:
hdfs oev -i ${dfs.namenode.name.dir}/current/edits_inprogress_* -o edits.out
Verify edits.out file. It should only have OP_START_LOG_SEGMENT transaction. For example:
<?xml version="1.0" encoding="UTF-8"?><EDITS> <EDITS_VERSION>-56</EDITS_VERSION> <RECORD> <OPCODE>OP_START_LOG_SEGMENT</OPCODE> <DATA> <TXID>5749</TXID> </DATA> </RECORD>
If edits.out has transactions other than OP_START_LOG_SEGMENT, run the following steps and then verify edit logs are empty.
Start the existing version NameNode.
Ensure there is a new FS image file.
Shut the NameNode down:
hdfs dfsadmin – saveNamespace
Rename or delete any paths that are reserved in the new version of HDFS.
When upgrading to a new version of HDFS, it is necessary to rename or delete any paths that are reserved in the new version of HDFS. If the NameNode encounters a reserved path during upgrade, it will print an error such as the following:
/.reserved is a reserved path and .snapshot is a reserved path component in this version of HDFS. Please rollback and delete or rename this path, or upgrade with the -renameReserved key-value pairs option to automatically rename these paths during upgrade.
Specifying
-upgrade -renameReserved
optional key-value pairs causes the NameNode to automatically rename any reserved paths found during startup.For example, to rename all paths named
.snapshot
to.my-snapshot
and change paths named.reserved
to.my-reserved
, specify-upgrade -renameReserved .snapshot=.my-snapshot,.reserved=.my-reserved
.If no key-value pairs are specified with
-renameReserved
, the NameNode will then suffix reserved paths with:.<LAYOUT-VERSION>.UPGRADE_RENAMED
For example:
.snapshot.-51.UPGRADE_RENAMED
.Note We recommend that you perform a
-saveNamespace
before renaming paths (running-saveNamespace
appears in a previous step in this procedure). This is because a data inconsistency can result if an edit log operation refers to the destination of an automatically renamed file.Also note that running
-renameReserved
will rename all applicable existing files in the cluster. This may impact cluster applications.If you are on JDK 1.6, upgrade the JDK on all nodes to JDK 1.7 or JDK 1.8 before upgrading HDP.