Upgrading from CDH 4 Packages to CDH 5 Packages
If you originally used Cloudera Manager to install your CDH service using packages, you can upgrade to CDH 5 either using packages or parcels. Parcels is the preferred and recommended way to upgrade, as the upgrade wizard provided for parcels handles the upgrade process almost completely automatically. However, if you wish to continue to use packages, you can perform an upgrade following the instructions presented here.
The steps to upgrade a CDH installation managed by Cloudera Manager using packages are as follows.
- Before You Begin
- Stop All Services
- Back up the HDFS Metadata on the NameNode
- Uninstall CDH 4
- Remove CDH 4 Repository Files
- Install CDH 5 Components
- Run the Upgrade Wizard
- Import MapReduce Configuration to YARN
- Restart the Reports Manager Role
- Recompile HBase Coprocessor and Custom JARs
- Finalize the HDFS Metadata Upgrade
Before You Begin
- Read the Cloudera Manager 5 Release Notes.
- Make sure there are no Oozie workflows in RUNNING or SUSPENDED status; otherwise the Oozie database upgrade will fail and you will have to reinstall CDH 4 to complete or kill those running workflows.
- Run the Host Inspector and fix every issue.
- If using security, run the Security Inspector.
- Run hdfs fsck / and hdfs dfsadmin -report and fix any issues.
- If using HBase:
- Run hbase hbck to make sure there are no inconsistencies.
- Before you can upgrade HBase
from CDH 4 to CDH 5, your HFiles must be upgraded from HFile v1 format to HFile v2,
because CDH 5 no longer supports HFile v1. The upgrade procedure itself is different if
you are using Cloudera Manager or the command line, but has the same results. The first
step is to check for instances of HFile v1 in the HFiles and mark them to be upgraded to
HFile v2, and to check for and report about corrupted files or files with unknown
versions, which need to be removed manually. The next step is to rewrite the HFiles
during the next major compaction. After the HFiles are upgraded, you can continue the
upgrade. To check and upgrade the files:
- In the Cloudera Admin Console, go to the HBase service and run .
- Check the output of the command in the
stderr log.
Your output should be similar to the following:
Tables Processed: hdfs://localhost:41020/myHBase/.META. hdfs://localhost:41020/myHBase/usertable hdfs://localhost:41020/myHBase/TestTable hdfs://localhost:41020/myHBase/t Count of HFileV1: 2 HFileV1: hdfs://localhost:41020/myHBase/usertable /fa02dac1f38d03577bd0f7e666f12812/family/249450144068442524 hdfs://localhost:41020/myHBase/usertable /ecdd3eaee2d2fcf8184ac025555bb2af/family/249450144068442512 Count of corrupted files: 1 Corrupted Files: hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812/family/1 Count of Regions with HFileV1: 2 Regions to Major Compact: hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812 hdfs://localhost:41020/myHBase/usertable/ecdd3eaee2d2fcf8184ac025555bb2af
In the example above, you can see that the script has detected two HFile v1 files, one corrupt file and the regions to major compact. - Trigger a major compaction on each of the
reported regions. This major compaction rewrites the files from HFile v1 to HFile v2
format. To run the major compaction, start HBase Shell and issue the major_compact
command.
$ bin/hbase shell hbase> major_compact 'usertable'
You can also do this in a single step by using the echo shell built-in command.$ echo "major_compact 'usertable'" | bin/hbase shell
- Review the upgrade procedure and reserve a maintenance window with enough time allotted to perform all steps. For production clusters, Cloudera recommends allocating up to a full day maintenance window to perform the upgrade, depending on the number of hosts, the amount of experience you have with Hadoop and Linux, and the particular hardware you are using.
- To avoid generating many alerts during the upgrade process, you can enable maintenance mode on your cluster before you start the upgrade. Be sure to exit maintenance mode when you have finished the upgrade, in order to re-enable Cloudera Manager alerts.
- Upgrade unmanaged components. Cloudera Manager 5 manages most, but not all, of the components available in the CDH distribution. Components that you might have installed that are not managed by Cloudera Manager include:
- Pig
- Whirr
- Mahout
- Crunch
- Ensure Java 7 is installed across the cluster. CDH 5 requires Java 7, and some services may not start if it is not installed. For installation instructions and recommendations for CDH 5, see (CDH 5) Java Development Kit Installation.
- Put the NameNode into safe mode. To upgrade CDH in multiple clusters, repeat this process for each cluster:
- In the Cloudera Manager Admin Console, go the HDFS service, NameNode role instance.
- Select and confirm that you want to do this.
- After the NameNode has successfully entered safemode, select and confirm that you want to do this. This will result in a new fsimage being written out with no edit log entries. Leave the NameNode in safe mode while you proceed with the upgrade instructions.
- Back up important databases:
- Cloudera Manager databases. For instructions, see Backing up Databases. You will need to indicate to the upgrade wizard that you have performed this step before the upgrade will proceed.
- Hive Metastore database (which could be in the embedded database)
- Hue database
- Oozie database
- Sqoop database
- If you have just upgraded to Cloudera Manager 5, you must hard restart the Cloudera Manager Agents as described in the Hard Restart Cloudera Manager Agents task in Upgrading Cloudera Manager 4 to Cloudera Manager 5 in Cloudera Manager Administration Guide.
Stop All Services
- Stop each cluster.
- On the Home page, click to the right of the cluster name and select Stop.
- Click Stop in the confirmation screen. The Command Details window shows the progress of stopping services.
When All services successfully stopped appears, the task is complete and you can close the Command Details window.
- Stop the Cloudera Management Service:
- Do one of the following:
-
- Select .
- Select .
-
- On the Home page, click to the right of Cloudera Management Service and select Stop.
-
- Click Stop to confirm. The Command Details window shows the progress of stopping the roles.
- When Command completed with n/n successful subcommands appears, the task is complete. Click Close.
- Do one of the following:
Back up the HDFS Metadata on the NameNode
- Stop the NameNode you want to back up.
- Go to the HDFS service.
- Click the Configuration tab.
- In the Search field, search for "NameNode Data Directories". This locates the NameNode Data Directories property.
- From the command line on the NameNode host, back up the directory listed
in the NameNode Data Directories property. If more than one is listed, then you
only need to make a backup of one directory, since each directory is a complete
copy. For example, if the data directory is /mnt/hadoop/hdfs/name, do the
following as root:
# cd /mnt/hadoop/hdfs/name # tar -cvf /root/nn_backup_data.tar .
You should see output like this:
./ ./current/ ./current/fsimage ./current/fstime ./current/VERSION ./current/edits ./image/ ./image/fsimage
Warning: If you see a file containing the word lock, the NameNode is probably still running. Repeat the preceding steps, starting by shutting down the CDH services.
Uninstall CDH 4
Uninstall CDH 4 on each host as follows:
Operating System | Command |
---|---|
RHEL | $ sudo yum remove bigtop-utils bigtop-jsvc bigtop-tomcat hue-common sqoop2-client solr |
SLES | $ sudo zypper remove bigtop-utils bigtop-jsvc bigtop-tomcat hue-common sqoop2-client solr |
Ubuntu or Debian | $ sudo apt-get purge bigtop-utils bigtop-jsvc bigtop-tomcat hue-common sqoop2-client solr |
Remove CDH 4 Repository Files
- Before removing the files, make sure you have not added any custom entries that you want to preserve. (To preserve custom entries, back up the files before removing them.)
- Make sure you remove Impala and Search repository files, as well as the CDH repository file.
Install CDH 5 Components
- Red Hat
- Download and install the "1-click Install" package
- Download the CDH 5 "1-click Install" package.
Click the entry in the table below that matches your Red Hat or CentOS system, choose Save File, and save the file to a directory to which you have write access (it can be your home directory).
OS Version Click this Link Red Hat/CentOS/Oracle 5 Red Hat/CentOS/Oracle 5 link Red Hat/CentOS/Oracle 6 Red Hat/CentOS/Oracle 6 link - Install the RPM:
- Red Hat/CentOS/Oracle 5
$ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
- Red Hat/CentOS/Oracle 6
$ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
- Red Hat/CentOS/Oracle 5
- Download the CDH 5 "1-click Install" package.
- (Optionally) add a repository key:
- Red Hat/CentOS/Oracle 5
$ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
- Red Hat/CentOS/Oracle 6
$ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
- Red Hat/CentOS/Oracle 5
- Install the CDH packages:
$ sudo yum clean all $ sudo yum install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie pig pig-udf-datafu search sentry solr-mapreduce spark-python sqoop sqoop2 whirr
Note: Installing these packages will also install all the other CDH packages that are needed for a full CDH 5 installation.
- Download and install the "1-click Install" package
- SLES
- Download and install the "1-click Install" package.
- Download the CDH 5 "1-click Install" package.
Click this link, choose Save File, and save it to a directory to which you have write access (it can be your home directory).
- Install the RPM:
$ sudo rpm -i cloudera-cdh-5-0.x86_64.rpm
- Update your system package index by running:
$ sudo zypper refresh
- Download the CDH 5 "1-click Install" package.
- (Optionally) add a repository key:
$ sudo rpm --import http://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/RPM-GPG-KEY-cloudera
- Install the CDH packages:
$ sudo zypper clean --all $ sudo zypper install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie pig pig-udf-datafu search sentry solr-mapreduce spark-python sqoop sqoop2 whirr
Note: Installing these packages will also install all the other CDH packages that are needed for a full CDH 5 installation.
- Download and install the "1-click Install" package.
- Ubuntu and Debian
- Download and install the "1-click Install" package
- Download the CDH 5 "1-click Install" package:
OS Version Click this Link Wheezy Wheezy link Precise Precise link - Install the package. Do one of the following:
- Choose Open with in the download window to use the package manager.
- Choose Save File, save the package to a directory to which you have write access (it can be your home directory) and install it from the command line, for example:
sudo dpkg -i cdh5-repository_1.0_all.deb
- Download the CDH 5 "1-click Install" package:
- (Optionally) add a repository key:
- Debian Wheezy
$ curl -s http://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/archive.key | sudo apt-key add -
- Ubuntu Precise
$ curl -s http://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key | sudo apt-key add -
- Debian Wheezy
- Install the CDH packages:
$ sudo apt-get update $ sudo apt-get install avro-tools crunch flume-ng hadoop-hdfs-fuse hadoop-hdfs-nfs3 hadoop-httpfs hbase-solr hive-hbase hive-webhcat hue-beeswax hue-hbase hue-impala hue-pig hue-plugins hue-rdbms hue-search hue-spark hue-sqoop hue-zookeeper impala impala-shell kite llama mahout oozie pig pig-udf-datafu search sentry solr-mapreduce spark-python sqoop sqoop2 whirr
Note: Installing these packages will also install all the other CDH packages that are needed for a full CDH 5 installation.
- Download and install the "1-click Install" package
Run the Upgrade Wizard
- Log into the Cloudera Manager Admin console.
- From the Home tab Status page, click next to the cluster name and select Upgrade Cluster. The Upgrade Wizard starts.
- Click the checkbox to acknowledge that you have backed up all your databases and click Continue.
- The next step shows you the hosts that the Upgrade Wizard has detected as needing to be upgraded.
- Select Use Packages as your install method. This method assumes you have already installed your CDH 5 packages. If you have not done so, the upgrade wizard will not continue. Do not select Use Parcels. That option will cause Cloudera Manager to download and distribute parcels to your cluster, even though you have already installed from packages. If you want to upgrade to CDH 5 using parcels, see Upgrading from CDH 4 to CDH 5 Parcels.
- Click Continue.
- The next page notifies you that the services on your cluster will be shut down. Rolling upgrade is not available. You can select whether to have all your services restarted and client configurations deployed automatically after the upgrade has finished. Click Continue to proceed.
- The upgrade wizard proceeds to execute the various steps involved in
upgrading your cluster, which includes:
- Waiting for the Cloudera Manager Agent to recognize the new CDH version
- Converting your configuration parameters
- Upgrading HDFS metadata, Sqoop server, Hive metastore, and various databases
- Deploying client configuration and restarting services, if you elected those options
Note: If you encounter errors during these steps:- If the upgrade reports the error "Could not find a healthy host with CDH5 on it to create HiveServer2", wait 30 seconds and retry the upgrade.
- If the converting configuration parameters step fails, Cloudera Manager rolls back all configurations to CDH 4. Fix any reported problems and retry the upgrade.
- If the upgrade command fails at any point after the convert configuration step, there is no retry support in Cloudera Manager. You must first correct the error, then manually re-run the individual commands. You can view the remaining commands in the Recent Commands page.
- If the HDFS upgrade metadata step fails, you cannot revert back to CDH 4 unless you restore a backup of Cloudera Manager.
- When the upgrade has finished, the Host Inspector runs. This should now show that the hosts are running CDH 5. Click Continue to proceed.
If your cluster name includes the string "CDH 4" the upgrade procedure changes the string to "CDH 5". Otherwise, it leaves the cluster name unchanged. If you want to rename the cluster, you can do so by clicking the cluster name, which displays a pop-up where you can change the name.
Import MapReduce Configuration to YARN
In CDH 5 and Cloudera Manager 5, YARN rather than MapReduce is the default MapReduce computation framework. If you had the MapReduce service configured in CDH 4, you can import the MapReduce configuration to YARN. This does not affect your MapReduce configuration.
- Configures services to use YARN as the MapReduce computation framework instead of MapReduce.
- Overwrites existing YARN configuration and role assignments.
- To import the existing configuration from your MapReduce service, select OK, set up YARN to add the YARN service and import the MapReduce settings. To skip the import, select Skip this step now. If you choose to skip this step, you can perform it at a later time from the YARN service.
- Click Continue to proceed. Cloudera Manager stops the YARN service (if running) and its dependencies. When these commands complete, click Continue.
- The next page indicates some additional configuration required by YARN. Verify or modify these and click Continue.
- The Switch Cluster to MR2 step proceeds. When all steps have been completed, click Continue.
- When all steps have complete, click Continue.
Restart the Reports Manager Role
- Do one of the following:
- Select .
- On the Status tab of the Home page, in Cloudera Management Service table, click the Cloudera Management Service link.
- Click the Instances tab.
- Check the checkbox next to Reports Manager.
- Select and then Restart to confirm.
Recompile HBase Coprocessor and Custom JARs
Before using any HBase applications that use coprocessor or custom JARs, you must recompile the JARs.Finalize the HDFS Metadata Upgrade
- In the Cloudera Manager Admin Console, pull down the Clusters tab and go to the HDFS service.
- Go to the Instances tab and click on the NameNode instance.
- From the NameNode Status page, from the Actions menu click Finalize Metadata Upgrade.
- Click Finalize Metadata Upgrade to confirm you want to complete this process.
Cloudera Manager finalizes the metadata upgrade.
<< Upgrading from CDH 4 to CDH 5 Parcels | Upgrading CDH 4 >> | |