1. Prepare the 2.1 Stack for Upgrade

To prepare for upgrading the HDP Stack, perform the following tasks:

  • Disable Security.

    [Important]Important

    If your Stack has Kerberos Security turned on, disable Kerberos before performing the Stack upgrade. On Ambari Web UI > Admin > Security, click Disable Kerberos. You can re-enable Kerberos Security after performing the upgrade.

  • Checkpoint user metadata and capture the HDFS operational state.

    This step supports rollback and restore of the original state of HDFS data, if necessary.

  • Backup Hive and Oozie metastore databases.

    This step supports rollback and restore of the original state of Hive and Oozie data, if necessary.

  • Stop all HDP and Ambari services.

  • Make sure to finish all current jobs running on the system before upgrading the stack.

[Note]Note

Libraries will change during the upgrade. Any jobs remaining active that use the older version libraries will probably fail during the upgrade.

  1. Use Ambari Web, browse to Services. Go thru each service and in the Service Actions menu, select Stop All, except for HDFS and ZooKeeper.

  2. Stop any client programs that access HDFS.

    Perform steps 3 through 8 on the NameNode host. In a highly-available NameNode configuration, execute the following procedure on the primary NameNode.

    [Note]Note

    To locate the primary NameNode in an Ambari-managed HDP cluster, browse Ambari Web > Services > HDFS. In Summary, click NameNode. Hosts > Summary displays the host name FQDN.

  3. If HDFS is in a non-finalized state from a prior upgrade operation, you must finalize HDFS before upgrading further. Finalizing HDFS will remove all links to the metadata of the prior HDFS version. Do this only if you do not want to rollback to that prior HDFS version.

    On the NameNode host, as the HDFS user,

    su -l <HDFS_USER> hdfs dfsadmin -finalizeUpgrade where <HDFS_USER> is the HDFS Service user. For example, hdfs.

  4. Check the NameNode directory to ensure that there is no snapshot of any prior HDFS upgrade.

    Specifically, using Ambari Web > HDFS > Configs > NameNode, examine the <dfs.namenode.name.dir> or the <dfs.name.dir> directory in the NameNode Directories property. Make sure that only a "\current" directory and no "\previous" directory exists on the NameNode host.

  5. Create the following logs and other files.

    Creating these logs allows you to check the integrity of the file system, post-upgrade.

    As the HDFS user, su -l<HDFS_USER> where <HDFS_USER> is the HDFS Service user. For example, hdfs.

    1. Run fsck with the following flags and send the results to a log.

      The resulting file contains a complete block map of the file system. You use this log later to confirm the upgrade.

      hdfs fsck / -files -blocks -locations > dfs-old-fsck-1.log

    2. Optional: Capture the complete namespace of the file system.

      The following command does a recursive listing of the root file system:

      hadoop dfs -ls -R / > dfs-old-lsr-1.log

    3. Create a list of all the DataNodes in the cluster.

      hdfs dfsadmin -report > dfs-old-report-1.log

    4. Optional: Copy all unrecoverable data stored in HDFS to a local file system or to a backup instance of HDFS.

  6. Save the namespace.

    You must be the HDFS service user to do this and you must put the cluster in Safe Mode.

    hdfs dfsadmin -safemode enter

    hdfs dfsadmin -saveNamespace

    [Note]Note

    In a highly-available NameNode configuration, the command hdfs dfsadmin -saveNamespace sets a checkpoint in the first NameNode specified in the configuration, in dfs.ha.namenodes.[nameservice ID]. You can also use the dfsadmin -fs option to specify which NameNode to connect.

    For example, to force a checkpoint in namenode 2:

    hdfs dfsadmin -fs hdfs://namenode2-hostname:namenode2-port -saveNamespace

  7. Copy the checkpoint files located in <dfs.name.dir/current> into a backup directory.

    Find the directory, using Ambari Web > HDFS > Configs > NameNode > NameNode Directories on your primary NameNode host.

    [Note]Note

    In a highly-available NameNode configuration, the location of the checkpoint depends on where the saveNamespace command is sent, as defined in the preceding step.

  8. Store the layoutVersion for the NameNode.

    Make a copy of the file at <dfs.name.dir>/current/VERSION, where <dfs.name.dir> is the value of the config parameter NameNode directories. This file will be used later to verify that the layout version is upgraded.

  9. Stop HDFS.

  10. Stop ZooKeeper.

  11. Using Ambari Web > Services > <service.name> > Summary, review each service and make sure that all services in the cluster are completely stopped.

  12. At the Hive Metastore database host, stop the Hive metastore service, if you have not done so already.

    [Note]Note

    Make sure that the Hive metastore database is running. For more information about Administering the Hive metastore database, see the Hive Metastore Administrator documentation.

  13. If you are upgrading Hive and Oozie, back up the Hive and Oozie metastore databases on the Hive and Oozie database host machines, respectively.

    [Important]Important

    Make sure that your Hive database is updated to the minimum recommended version.

    If you are using Hive with MySQL, we recommend upgrading your MySQL database to version 5.6.21 before upgrading the HDP Stack to v2.2.x. For specific information, see Database Requirements.

    1. Optional - Back up the Hive Metastore database.

      [Note]Note

      These instructions are provided for your convenience. Please check your database documentation for the latest back up instructions.

      Hive Metastore Database Backup and Restore

      Database Type

      Backup

      Restore

      MySQL

      mysqldump <dbname> > <outputfilename.sql>

      For example:

      mysqldump hive > /tmp/mydir/backup_hive.sql

      mysql <dbname> < <inputfilename.sql>

      For example:

      mysql hive < /tmp/mydir/backup_hive.sql

      Postgres

      sudo -u <username> pg_dump <databasename> > <outputfilename.sql>

      For example:

      sudo -u postgres pg_dump hive

      > /tmp/mydir/backup_hive.sql

      sudo -u <username> psql <databasename> < <inputfilename.sql>

      For example:

      sudo -u postgres psql hive

      < /tmp/mydir/backup_hive.sql

      Oracle

      Connect to the Oracle database using sqlplus export the database:

      exp username/password@database full=yes file=output_file.dmp

      Import the database:

      imp username/password@database file=input_file.dmp

    2. Optional - Back up the Oozie Metastore database.

      [Note]Note

      These instructions are provided for your convenience. Please check your database documentation for the latest back up instructions.

      Oozie Metastore Database Backup and Restore

      Database Type

      Backup

      Restore

      MySQL

      mysqldump <dbname> > <outputfilename.sql>

      For example:

      mysqldump oozie > /tmp/mydir/backup_oozie.sql

      mysql <dbname> < <inputfilename.sql>

      For example:

      mysql oozie < /tmp/mydir/backup_oozie.sql

      Postgres

      sudo -u <username> pg_dump <databasename> > <outputfilename.sql>

      For example:

      sudo -u postgres pg_dump oozie

      > /tmp/mydir/backup_oozie.sql

      sudo -u <username> psql <databasename> < <inputfilename.sql>

      For example:

      sudo -u postgres psql oozie

      < /tmp/mydir/backup_oozie.sql

  14. Backup Hue. If you are using the embedded SQLite database, you must perform a backup of the database before you upgrade Hue to prevent data loss. To make a backup copy of the database, stop Hue, then "dump" the database content to a file, as follows:

    ./etc/init.d/hue stop su $HUE_USER mkdir ~/hue_backup cd /var/lib/hue sqlite3 desktop.db .dump > ~/hue_backup/desktop.bak For other databases, follow your vendor-specific instructions to create a backup.

  15. Stage the upgrade script.

    • Create an "Upgrade Folder". For example, /work/upgrade_hdp_2, on the Ambari Server host.

      mkdir -p /work/upgrade_hdp_2

      cd /work/upgrade_hdp_2

    • Download and copy upgrade helper to the Upgrade Folder. Be sure to set the helper to have execute permissions. The helper is available for download here:

      curl -O https://github.com/apache/ambari/blob/branch-2.0.maint/ambari-server/src/main/python/upgradeHelper.py

      chmod +x upgradeHelper.py

    • Download and copy upgrade catalog to the Upgrade Folder. The catalog is available for download here:

      curl -O https://github.com/apache/ambari/blob/branch-2.0.maint/ambari-server/src/main/resources/upgrade/catalog/UpgradeCatalog_2.1_to_2.2.4.json

      [Note]Note

      2.2.4 in the above URL represents the HDP version you are upgrading to. For example, if you are using HDP 2.2.4, the UpgradeCatalog_2.1_to_2.2.4.json catalog is used. Use the HDP 2.2.4 catalog for HDP 2.2.6 as well.

      [Note]Note

      Make sure that Python is available on the host and that the version is 2.6 or higher: python --version For RHEL/Centos/Oracle Linux 5, you must use Python 2.6.

  16. Run the upgrade catalog to backup the cluster configuration settings.

    1. Go to the Upgrade Folder you just created in step 15.

    2. Execute the backup-configs action:

      python upgradeHelper.py --hostname $HOSTNAME --user $USERNAME --password $PASSWORD --clustername $CLUSTERNAME backup-configs

      Variable

      Value

      $HOSTNAME

      Ambari Server hostname. This should be the FQDN for the host running the Ambari Server.

      $USERNAME Ambari Admin user.
      $PASSWORD Password of the user.
      $CLUSTERNAME Name of the cluster. This is the name you provided when you installed the cluster with Ambari. Login to Ambari and the name can be found in the upper-left of the Ambari Web screen. This is case-sensitive.
  17. On the Ambari Server host, stop Ambari Server and confirm that it is stopped.

    ambari-server stop

    ambari-server status

  18. Stop all Ambari Agents. On every host in your cluster known to Ambari,

    ambari-agent stop