1. Getting Ready to Upgrade - Hortonworks Data Platform

Hardware recommendations

Although there is no single hardware requirement for installing HDP, there are some basic guidelines. The HDP packages for a complete installation of HDP 2.2 will take up about 2.5 GB of disk space.

Back up the following HDP 1.x directories:

/etc/hadoop/conf
/etc/hbase/conf
/etc/hcatalog/conf (Note: With HDP 2.2, /etc/hcatalog/conf is divided into /etc/hive-hcatalog/conf and /etc/hive-webhcat/conf. You cannot use /etc/hcatalog/conf in HDP 2.2.)
/etc/hive/conf
/etc/pig/conf
/etc/sqoop/conf
/etc/flume/conf
/etc/mahout/conf
/etc/oozie/conf
/etc/hue/conf
/etc/zookeeper/conf

Optional: Back up your userlogs directories, ${mapred.local.dir}/userlogs.

Run the fsck command as the HDFS Service user and fix any errors. (The resulting file contains a complete block map of the file system.) For example, in a situation where clusters are unsecure and Kerberos credentials are not required for the HDFS user:

su -l <HDFS_USER>

hadoop fsck / -files -blocks -locations > dfs-old-fsck-1.log

where $HDFS_USER is the HDFS Service user. For example, hdfs.

As the user running the HDFS service (by default, the user is hdfs), run the following commands:

Capture the complete namespace of the file system. (The following command does a recursive listing of the root file system.)
```
su -l <HDFS_USER>
hadoop dfs -lsr / > dfs-old-lsr-1.log
```
where $HDFS_USER is the HDFS Service user. For example, hdfs.
Run the report command to create a list of DataNodes in the cluster.
```
su -l <HDFS_USER> 
hadoop dfsadmin -report > dfs-old-report-1.log
```
where $HDFS_USER is the HDFS Service user. For example, hdfs
Optional: You can copy all or unrecoverable only data stored in HDFS to a local file system or to a backup instance of HDFS.
Optional: You can also repeat the steps 3 (a) through 3 (c) and compare the results with the previous run to ensure the state of the file system remained unchanged.

HBase 0.96.0 and subsequent releases discontinue support for the HFileV1 file format, a common format prior to HBase 0.94. Before you upgrade, check for V1-format files as follows:

Download the Apache 0.94.24+HBase tarball in a machine. Run the binaries.
On the machine running the HBase 0.94 binaries, point the hbase-site.xml configuration file to a 0.94 cluster.
Check for HFiles in V1 format as follows:
./bin/hbase org.apache.hadoop.hbase.util.HFileV1Detector -p <hbase root data path>

When you run the upgrade check, if “Count of HFileV1” returns any files, start the HBase shell to use major compaction for regions that have HFileV1 format. For example, the following sample output indicates that you need to compact two regions, fa02dac1f38d03577bd0f7e666f12812 and ecdd3eaee2d2fcf8184ac025555bb2af:

Tables Processed:
hdfs://localhost:41020/myHBase/.META. 
hdfs://localhost:41020/myHBase/usertable 
hdfs://localhost:41020/myHBase/TestTable 
hdfs://localhost:41020/myHBase/tCount of HFileV1: 2 HFileV1:
hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812/ family/249450144068442524
hdfs://localhost:41020/myHBase/usertable/ecdd3eaee2d2fcf8184ac025555bb2af/ family/249450144068442512

Count of corrupted files: 1 
Corrupted Files:
hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812/ family/1
Count of Regions with HFileV1: 2 
Regions to Major Compact:
hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812 
hdfs://localhost:41020/myHBase/usertable/ecdd3eaee2d2fcf8184ac025555bb2af

Optional: If you are upgrading HBase on a secure cluster, flush the ACL table by running the following HBase shell command as the $HBase_User.

flush '_acl_'

Stop all HDP 1.3 services (including MapReduce) except HDFS:

Stop Nagios. On the Nagios host machine, run the following command:
service nagios stop
Stop Ganglia.
- Run this command on the Ganglia server host machine:
  /etc/init.d/hdp-gmetad stop
- Run this command on all the nodes in your Hadoop cluster:
  /etc/init.d/hdp-gmond stop
Stop Oozie. On the Oozie server host machine, run the following command:
sudo su -l $OOZIE_USER -c "cd $OOZIE_LOG_DIR; /usr/lib/oozie/bin/oozie-stop.sh"
where:
$OOZIE_USER is the Oozie Service user. For example, oozie
$OOZIE_LOG_DIR is the directory where Oozie log files are stored (for example: /var/log/oozie).
Stop WebHCat. On the WebHCat host machine, run the following command:
su -l $WEBHCAT_USER -c "/usr/lib/hcatalog/sbin/webhcat_server.sh stop"
where $WEBHCAT_USER is the WebHCat Service user. For example, hcat.
Stop Hive. On the Hive Metastore host machine and Hive Server2 host machine, run the following command:
ps aux | awk '{print $1,$2}' | grep hive | awk '{print $2}' | xargs kill >/dev/null 2>&1
This stops the Hive Metastore and HCatalog services.
Stop ZooKeeper. On the ZooKeeper host machine, run the following command:
su - $ZOOKEEPER_USER -c "export ZOOCFGDIR=/etc/zookeeper/conf ; export ZOOCFG=zoo.cfg ;source /etc/zookeeper/conf/zookeeper-env.sh ; /usr/lib/zookeeper-server/bin/zkServer.sh stop"
where $ZOOKEEPER_USER is the ZooKeeper Service user. For example, zookeeper.
Stop HBase.
- Run these commands on all RegionServers:
  su -l $HBASE_USER -c "/usr/lib/hbase/bin/hbase-daemon.sh --config /etc/ hbase/conf stop regionserver"
- Run these commands on the HBase Master host machine:
  su -l $HBASE_USER -c "/usr/lib/hbase/bin/hbase-daemon.sh --config /etc/ hbase/conf stop master"
  where $HBASE_USER is the HBase Service user. For example, hbase.
Stop MapReduce
- Run these commands on all TaskTrackers slaves:
  su -l $MAPRED_USER -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config / etc/hadoop/conf stop tasktracker"
- Run these commands on the HistoryServer host machine:
  su -l $MAPRED_USER -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config / etc/hadoop/conf stop historyserver"
- Run these commands on the node running the JobTracker host machine:
  su -l $MAPRED_USER -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config / etc/hadoop/conf stop jobtracker"
where $MAPRED_USER is the MapReduce Service user. For example, mapred.

As the HDFS user, save the namespace by executing the following command:

su -l <HDFS_USER>

hadoop dfsadmin -safemode enter

hadoop dfsadmin -saveNamespace

Backup your NameNode metadata.

Copy the following checkpoint files into a backup directory:
- dfs.name.dir/edits
- dfs.name.dir/image/fsimage
- dfs.name.dir/current/fsimage
Store the layoutVersion of the namenode.
${dfs.name.dir}/current/VERSION

If you have a prior HDFS upgrade in progress, finalize it if you have not done so already.

su -l <HDFS_USER>

hadoop dfsadmin -finalizeUpgrade

Optional: Back up the Hive Metastore database.

These instructions are provided for your convenience. Please check your database documentation for the latest backup instructions.

The following instructions are provided for your convenience. For the latest backup instructions, please check your database documentation.

Table 3.1. Hive Metastore Database Backup and Restore

Database Type	Backup	Restore
MySQL	mysqldump $dbname > $outputfilename.sql For example: mysqldump hive > / tmp/mydir/backup_hive.sql	mysql $dbname < $inputfilename.sql For example: mysql hive </tmp/mydir/ backup_hive.sql
Postgres	sudo -u $username pg_dump $databasename > $outputfilename.sql For example: sudo -u postgres pg_dump hive > /tmp/mydir/ backup_hive.sql	sudo -u $username psql $databasename < $inputfilename.sql For example: sudo -u postgres psql hive </tmp/mydir/ backup_hive.sql
Oracle	Connect to the Oracle database using sqlplus. Export the database: exp username/ password@database full=yes file=output_file.dmp	Import the database: imp username/password@database ile=input_file.dmp

Database Type

Backup

Restore

MySQL

mysqldump $dbname >

$outputfilename.sql

For example:

mysqldump hive > / tmp/mydir/backup_hive.sql

mysql $dbname <

$inputfilename.sql

For example:

mysql hive </tmp/mydir/ backup_hive.sql

Postgres

sudo -u $username pg_dump $databasename >

$outputfilename.sql

For example:

sudo -u postgres pg_dump hive > /tmp/mydir/ backup_hive.sql

sudo -u $username psql $databasename <

$inputfilename.sql

For example:

sudo -u postgres psql hive </tmp/mydir/ backup_hive.sql

Oracle

Connect to the Oracle database using sqlplus.

Export the database:

exp username/ password@database full=yes file=output_file.dmp

Import the database: imp username/password@database ile=input_file.dmp

Optional: Back up the Oozie Metastore database.

The following instructions are provided for your convenience. For the latest backup instructions, please check your database documentation.

Table 3.2. Oozie Metastore Database Backup and Restore

Database Type Backup Restore

Database Type	Backup	Restore
MySQL	`mysqldump $dbname >` `$outputfilename.sql` For example: `mysqldump oozie > / tmp/mydir/backup_oozie.sql`	`mysql $dbname <` `$inputfilename.sql` For example: `mysql oozie </tmp/mydir/ backup_oozie.sql`
Postgres	`sudo -u $username pg_dump $databasename >` `$outputfilename.sql` For example: `sudo -u postgres pg_dump oozie > /tmp/mydir/ backup_oozie.sql`	`sudo -u $username psql $databasename <` `$inputfilename.sql` For example: `sudo -u postgres psql oozie </tmp/mydir/ backup_oozie.sql`

MySQL

mysqldump $dbname >

$outputfilename.sql

For example:

mysqldump oozie > / tmp/mydir/backup_oozie.sql

mysql $dbname <

$inputfilename.sql

For example:

mysql oozie </tmp/mydir/ backup_oozie.sql

Postgres

sudo -u $username pg_dump $databasename >

$outputfilename.sql

For example:

sudo -u postgres pg_dump oozie > /tmp/mydir/ backup_oozie.sql

sudo -u $username psql $databasename <

$inputfilename.sql

For example:

sudo -u postgres psql oozie </tmp/mydir/ backup_oozie.sql

Optional: Back up the Hue database.

The following instructions are provided for your convenience. For the latest backup instructions, please see your database documentation. For database types that are not listed below, follow your vendor-specific instructions.

Table 3.3. Hue Database Backup and Restore

Database Type	Backup	Restore
MySQL	mysqldump $dbname > $outputfilename.sqlsbr For example: mysqldump hue > /tmp/mydir/backup_hue.sql	mysql $dbname < $inputfilename.sqlsbr For example: mysql hue < /tmp/mydir/backup_hue.sql
Postgres	sudo -u $username pg_dump $databasename > $outputfilename.sql sbr For example: sudo -u postgres pg_dump hue > /tmp/mydir/backup_hue.sql	sudo -u $username psql $databasename < $inputfilename.sqlsbr For example: sudo -u postgres psql hue < /tmp/mydir/backup_hue.sql
Oracle	Connect to the Oracle database using sqlplus. Export the database. For example: exp username/password@database full=yes file=output_file.dmp mysql $dbname < $inputfilename.sqlsbr	Import the database: For example: imp username/password@database file=input_file.dmp
SQLite	/etc/init.d/hue stop su $HUE_USER mkdir ~/hue_backup sqlite3 desktop.db .dump > ~/hue_backup/desktop.bak /etc/init.d/hue start	/etc/init.d/hue stop cd /var/lib/hue mv desktop.db desktop.db.old sqlite3 desktop.db < ~/hue_backup/desktop.bak /etc/init.d/hue start

Database Type

Backup

Restore

MySQL

mysqldump $dbname > $outputfilename.sqlsbr

For example:

mysqldump hue > /tmp/mydir/backup_hue.sql

mysql $dbname < $inputfilename.sqlsbr

For example:

mysql hue < /tmp/mydir/backup_hue.sql

Postgres

sudo -u $username pg_dump $databasename > $outputfilename.sql sbr

For example:

sudo -u postgres pg_dump hue > /tmp/mydir/backup_hue.sql

sudo -u $username psql $databasename < $inputfilename.sqlsbr

For example:

sudo -u postgres psql hue < /tmp/mydir/backup_hue.sql

Oracle

Connect to the Oracle database using sqlplus. Export the database.

For example:

exp username/password@database full=yes file=output_file.dmp mysql $dbname < $inputfilename.sqlsbr

Import the database:

For example:

imp username/password@database file=input_file.dmp

SQLite

/etc/init.d/hue stop

su $HUE_USER

mkdir ~/hue_backup

sqlite3 desktop.db .dump > ~/hue_backup/desktop.bak

/etc/init.d/hue start

/etc/init.d/hue stop

cd /var/lib/hue

mv desktop.db desktop.db.old

sqlite3 desktop.db < ~/hue_backup/desktop.bak

/etc/init.d/hue start

Stop HDFS.

Run these commands on all DataNodes:
su -l $HDFS_USER -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/ hadoop/conf stop datanode"
su -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf stop datanode"
Run these commands on the Secondary NameNode host machine:
su -l $HDFS_USER -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/ hadoop/conf stop secondarynamenode”
Run these commands on the NameNode host machine:
su -l $HDFS_USER -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf stop namenode"
where $HDFS_USER is the HDFS Service user. For example, hdfs.

Verify that edit logs in ${dfs.namenode.name.dir}/current/edits* are empty.

Run the following command:
hdfs oev -i ${dfs.namenode.name.dir}/current/edits_inprogress_* -o edits.out

Verify edits.out file. It should only have OP_START_LOG_SEGMENT transaction. For example:

<?xml version="1.0" encoding="UTF-8"?>
 <EDITS>
 <EDITS_VERSION>-56</EDITS_VERSION>
 <RECORD>
 <OPCODE>OP_START_LOG_SEGMENT</OPCODE>
 <DATA>
 <TXID>5749</TXID>
 </DATA>
 </RECORD>

If edits.out has transactions other than OP_START_LOG_SEGMENT, run the following steps and then verify edit logs are empty.
- Start the existing version NameNode.
- Ensure there is a new FS image file.
- Shut the NameNode down.

Rename or delete any paths that are reserved in the new version of HDFS.

When upgrading to a new version of HDFS, it is necessary to rename or delete any paths that are reserved in the new version of HDFS. If the NameNode encounters a reserved path during upgrade, it will print an error such as the following:

/.reserved is a reserved path and .snapshot is a reserved path component in this version of HDFS. Please rollback and delete or rename this path, or upgrade with the -renameReserved key-value pairs option to automatically rename these paths during upgrade.

Specifying -upgrade -renameReserved optional key-value pairs causes the NameNode to automatically rename any reserved paths found during startup. For example, to rename all paths named .snapshot to .my-snapshot and change paths named .reserved to .my-reserved, a user would specify:

-upgrade -renameReserved .snapshot=.my-snapshot,.reserved=.my-reserved.

If no key-value pairs are specified with -renameReserved, the NameNode will then suffix reserved paths with .<LAYOUT-VERSION>.UPGRADE_RENAMED, for example:

.snapshot.-51.UPGRADE_RENAMED.

Note

	Note
We recommend that you perform a `-saveNamespace` before renaming paths (running `-saveNamespace` appears in a previous step in this procedure). This is because a data inconsistency can result if an edit log operation refers to the destination of an automatically renamed file. Also note that running `-renameReserved` will rename all applicable existing files in the cluster. This may impact cluster applications.

We recommend that you perform a -saveNamespace before renaming paths (running -saveNamespace appears in a previous step in this procedure). This is because a data inconsistency can result if an edit log operation refers to the destination of an automatically renamed file.

Also note that running -renameReserved will rename all applicable existing files in the cluster. This may impact cluster applications.