This is the documentation for Cloudera Manager 5.0.x. Documentation for other versions is available at Cloudera Documentation.

Upgrading from CDH 4 Packages to CDH 5 Packages

If you originally used Cloudera Manager to install your CDH service using packages, you can upgrade to CDH 5 either using packages or parcels. Parcels is the preferred and recommended way to upgrade, as the upgrade wizard provided for parcels handles the upgrade process almost completely automatically. However, if you wish to continue to use packages, you can perform an upgrade following the instructions presented here.

The steps to upgrade a CDH installation managed by Cloudera Manager using packages are as follows.

  1. Before You Begin
  2. Stop All Services
  3. Back up the HDFS Metadata on the NameNode
  4. Uninstall CDH 4
  5. Remove CDH 4 Repository Files
  6. Install CDH 5 Components
  7. Run the Upgrade Wizard
  8. Import MapReduce Configuration to YARN
  9. Restart the Reports Manager Role
  10. Finalize the HDFS Metadata Upgrade

Before You Begin

  • Read the Cloudera Manager Release Notes.
  • Make sure there are no Oozie workflows in RUNNING or SUSPENDED status; otherwise the Oozie database upgrade will fail and you will have to reinstall CDH 4 to complete or kill those running workflows.
  • Plan downtime. If you are upgrading a cluster that is part of a production system, be sure to plan ahead. As with any operational work, be sure to reserve a maintenance window with enough extra time allotted in case of complications. The Hadoop upgrade process is well understood, but it is best to be cautious. For production clusters, Cloudera recommends allocating up to a full day maintenance window to perform the upgrade, depending on the number of hosts, the amount of experience you have with Hadoop and Linux, and the particular hardware you are using.
  • To avoid generating many alerts during the upgrade process, you can enable maintenance mode on your cluster before you start the upgrade. Be sure to exit maintenance mode when you have finished the upgrade, in order to re-enable Cloudera Manager alerts.
  • Upgrade unmanaged components. Cloudera Manager 5 manages most, but not all, of the components available in the CDH distribution. Components that you might have installed that are not managed by Cloudera Manager include:
    • Pig
    • Whirr
    • Mahout
    • Crunch
    Upgrade these unmanaged components before proceeding to upgrade managed components. For information on upgrading unmanaged components, see the CDH 5 Installation Guide.
  • Ensure Java 7 is installed across the cluster. CDH 5 requires Java 7, and some services may not start if it is not installed. For installation instructions and recommendations for CDH 5, see (CDH 5) Java Development Kit Installation.
  • Put the NameNode into safe mode. To upgrade CDH in multiple clusters, repeat this process for each cluster:
    1. In the Cloudera Manager Admin Console, go the HDFS service, NameNode role instance.
    2. Select Actions > Enter Safemode... and confirm that you want to do this.
    3. After the NameNode has successfully entered safemode, select Actions > Save Namespace... and confirm that you want to do this. This will result in a new fsimage being written out with no edit log entries. Leave the NameNode in safe mode while you proceed with the upgrade instructions.
  • Back up important databases:
    • Cloudera Manager databases. For instructions, see Backing up Databases. You will need to indicate to the upgrade wizard that you have performed this step before the upgrade will proceed.
    • Hive Metastore database (which could be in the embedded database)
    • Hue database
    • Oozie database
    • Sqoop database
  • If you have just upgraded to Cloudera Manager 5, you must hard restart the Cloudera Manager Agents as described in the Hard Restart Cloudera Manager Agents task in Upgrading Cloudera Manager 4 to Cloudera Manager 5 in Cloudera Manager Administration Guide.

Stop All Services

  1. Stop each cluster.
    1. On the Home page, click to the right of the cluster name and select Stop.
    2. Click Stop in the confirmation screen. The Command Details window shows the progress of stopping services.

      When All services successfully stopped appears, the task is complete and you can close the Command Details window.

  2. Stop the Cloudera Management Service:
    1. Do one of the following:
        1. Select Clusters > Cloudera Management Service > mgmt.
        2. Select Actions > Stop.
        1. On the Home page, click to the right of mgmt and select Stop.
    2. Click Stop to confirm. The Command Details window shows the progress of stopping the roles.
    3. When Command completed with n/n successful subcommands appears, the task is complete. Click Close.

Back up the HDFS Metadata on the NameNode

  1. Stop the NameNode you want to back up.
  2. Go to the HDFS service.
  3. Select Configuration > View and Edit.
  4. In the Search field, search for "NameNode Data Directories". This locates the NameNode Data Directories property.
  5. From the command line on the NameNode host, back up the directory listed in the NameNode Data Directories property. If more than one is listed, then you only need to make a backup of one directory, since each directory is a complete copy. For example, if the data directory is /mnt/hadoop/hdfs/name, do the following as root:
    # cd /mnt/hadoop/hdfs/name
    # tar -cvf /root/nn_backup_data.tar .

    You should see output like this:

    ./
    ./current/
    ./current/fsimage
    ./current/fstime
    ./current/VERSION
    ./current/edits
    ./image/
    ./image/fsimage
      Warning: If you see a file containing the word lock, the NameNode is probably still running. Repeat the preceding steps, starting by shutting down the CDH services.
  1. Stop the NameNode you want to back up.
  2. Go to the HDFS service.
  3. Select Configuration > View and Edit.
  4. In the Search field, search for "NameNode Data Directories". This locates the NameNode Data Directories property.
  5. From the command line on the NameNode host, back up the directory listed in the NameNode Data Directories property. If more than one is listed, then you only need to make a backup of one directory, since each directory is a complete copy. For example, if the data directory is /mnt/hadoop/hdfs/name, do the following as root:
    # cd /mnt/hadoop/hdfs/name
    # tar -cvf /root/nn_backup_data.tar .

    You should see output like this:

    ./
    ./current/
    ./current/fsimage
    ./current/fstime
    ./current/VERSION
    ./current/edits
    ./image/
    ./image/fsimage
      Warning: If you see a file containing the word lock, the NameNode is probably still running. Repeat the preceding steps, starting by shutting down the CDH services.

Uninstall CDH 4

Uninstall CDH 4 on each host as follows:

Operating System Command
RHEL $ sudo yum remove bigtop-utils bigtop-jsvc bigtop-tomcat hue-common sqoop2-client solr
SLES $ sudo zypper remove bigtop-utils bigtop-jsvc bigtop-tomcat hue-common sqoop2-client solr
Ubuntu or Debian $ sudo apt-get purge bigtop-utils bigtop-jsvc bigtop-tomcat hue-common sqoop2-client solr

Remove CDH 4 Repository Files

Remove all Cloudera CDH 4 repository files. For example, on a Red Hat or similar system, remove all files in /etc/yum.repos.d that have cloudera as part of the name.
  Important:
  • Before removing the files, make sure you have not added any custom entries that you want to preserve. (To preserve custom entries, back up the files before removing them.)
  • Make sure you remove Impala and Search repository files, as well as the CDH repository file.

Install CDH 5 Components

  • Red Hat
    1. Download and install the "1-click Install" package
      1. Download the CDH 5 "1-click Install" package.

        Click the entry in the table below that matches your Red Hat or CentOS system, choose Save File, and save the file to a directory to which you have write access (it can be your home directory).

        OS Version Click this Link
        Red Hat/CentOS/Oracle 5 Red Hat/CentOS/Oracle 5 link
        Red Hat/CentOS/Oracle 6 Red Hat/CentOS/Oracle 6 link
      2. Install the RPM:
        • Red Hat/CentOS/Oracle 5
          $ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm 
        • Red Hat/CentOS/Oracle 6
          $ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
    2. (Optionally) add a repository key:
      • Red Hat/CentOS/Oracle 5
        $ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
      • Red Hat/CentOS/Oracle 6
        $ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
    3. Install the CDH packages:
      $ sudo yum clean all
      $ sudo yum install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie pig pig-udf-datafu search sentry solr-mapreduce spark-python sqoop sqoop2 whirr
        Note: Installing these packages will also install all the other CDH packages that are needed for a full CDH 5 installation.
  • SLES
    1. Download and install the "1-click Install" package.
      1. Download the CDH 5 "1-click Install" package.

        Click this link, choose Save File, and save it to a directory to which you have write access (it can be your home directory).

      2. Install the RPM:
        $ sudo rpm -i cloudera-cdh-5-0.x86_64.rpm
      3. Update your system package index by running:
        $ sudo zypper refresh
    2. (Optionally) add a repository key:
      $ sudo rpm --import http://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/RPM-GPG-KEY-cloudera
    3. Install the CDH packages:
      $ sudo zypper clean --all
      $ sudo zypper install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie pig pig-udf-datafu search sentry solr-mapreduce spark-python sqoop sqoop2 whirr
        Note: Installing these packages will also install all the other CDH packages that are needed for a full CDH 5 installation.
  • Ubuntu and Debian
    1. Download and install the "1-click Install" package
      1. Download the CDH 5 "1-click Install" package:
        OS Version Click this Link
        Wheezy Wheezy link
        Precise Precise link
      2. Install the package. Do one of the following:
        • Choose Open with in the download window to use the package manager.
        • Choose Save File, save the package to a directory to which you have write access (it can be your home directory) and install it from the command line, for example:
          sudo dpkg -i cdh5-repository_1.0_all.deb
    2. (Optionally) add a repository key:
      • Debian Wheezy
        $ curl -s http://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/archive.key | sudo apt-key add -
      • Ubuntu Precise
        $ curl -s http://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key | sudo apt-key add -
    3. Install the CDH packages:
      $ sudo apt-get update
      $ sudo apt-get install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie pig pig-udf-datafu search sentry solr-mapreduce spark-python sqoop sqoop2 whirr
        Note: Installing these packages will also install all the other CDH packages that are needed for a full CDH 5 installation.

Run the Upgrade Wizard

  1. Log into the Cloudera Manager Admin console.
  2. From the Home tab Status page, click next to the cluster name and select Upgrade Cluster. The Upgrade Wizard starts.
  3. Click the checkbox to acknowledge that you have backed up all your databases and click Continue.
  4. The next step shows you the hosts that the Upgrade Wizard has detected as needing to be upgraded.
  5. Select Use Packages as your install method. This method assumes you have already installed your CDH 5 packages. If you have not done so, the upgrade wizard will not continue. Do not select Use Parcels. That option will cause Cloudera Manager to download and distribute parcels to your cluster, even though you have already installed from packages. If you want to upgrade to CDH 5 using parcels, see Upgrading from CDH 4 to CDH 5 Parcels.
  6. Click Continue.
  7. The next page notifies you that the services on your cluster will be shut down. Rolling upgrade is not available. You can select whether to have all your services restarted and client configurations deployed automatically after the upgrade has finished. Click Continue to proceed.
  8. The upgrade wizard proceeds to execute the various steps involved in upgrading your cluster, which includes:
    • Waiting for the Cloudera Manager Agent to recognize the new CDH version
    • Converting your configuration parameters
    • Upgrading HDFS metadata, Sqoop server, Hive metastore, and various databases
    • Deploying client configuration and restarting services, if you elected those options
      Note: If you encounter errors during these steps:
    • If the converting configuration parameters step fails, Cloudera Manager rolls back all configurations to CDH 4. Fix any reported problems and retry the upgrade.
    • If the upgrade command fails at any point after the convert configuration step, there is no retry support in Cloudera Manager. You must first correct the error, then manually re-run the individual commands. You can view the remaining commands in the Recent Commands page.
    • If the HDFS upgrade metadata step fails, you cannot revert back to CDH 4 unless you restore a backup of Cloudera Manager.
  9. When the upgrade has finished, the Host Inspector runs. This should now show that the hosts are running CDH 5. Click Continue to proceed.

    If your cluster name includes the string "CDH 4" the upgrade procedure changes the string to "CDH 5". Otherwise, it leaves the cluster name unchanged. If you want to rename the cluster, you can do so by clicking the cluster name, which displays a pop-up where you can change the name.

Import MapReduce Configuration to YARN

In CDH 5 and Cloudera Manager 5, YARN rather than MapReduce is the default MapReduce computation framework. If you had the MapReduce service configured in CDH 4, you can import the MapReduce configuration to YARN. This does not affect your MapReduce configuration.

  Warning: In addition to importing configuration settings, the import process:
  • Configures services to use YARN as the MapReduce computation framework instead of MapReduce.
  • Overwrites existing YARN configuration and role assignments.
  1. To import the existing configuration from your MapReduce service, select OK, set up YARN to add the YARN service and import the MapReduce settings. To skip the import, select Skip this step now. If you choose to skip this step, you can perform it at a later time from the YARN service.
  2. Click Continue to proceed. Cloudera Manager stops the YARN service (if running) and its dependencies. When these commands complete, click Continue.
  3. The next page indicates some additional configuration required by YARN. Verify or modify these and click Continue.
  4. The Switch Cluster to MR2 step proceeds. When all steps have been completed, click Continue.
  5. When all steps have complete, click Continue.

Restart the Reports Manager Role

  1. Do one of the following:
    • Select Clusters > Cloudera Management Service > mgmt.
    • On the Status tab of the Home page, in Cloudera Management Service table, click the mgmt link.
  2. Click the Instances tab.
  3. Check the checkbox next to reportsmanager.
  4. Select Actions for Selected > Restart and then Restart to confirm.

Finalize the HDFS Metadata Upgrade

After ensuring that the CDH 5 upgrade has succeeded and that everything is running smoothly, finalize the HDFS metadata upgrade. It is not unusual to wait days or even weeks before finalizing the upgrade.
  1. In the Cloudera Manager Admin Console, pull down the Clusters tab and go to the HDFS service.
  2. Go to the Instances tab and click on the NameNode instance.
  3. From the NameNode Status page, from the Actions menu click Finalize Metadata Upgrade.
  4. Click Finalize Metadata Upgrade to confirm you want to complete this process.

    Cloudera Manager finalizes the metadata upgrade.

Page generated September 3, 2015.