Upgrading CDH 4 Using Packages
If you originally used Cloudera Manager to install your CDH service using packages, you can upgrade to a new version of CDH 4 either using packages or parcels. Parcels is the preferred and recommended way to upgrade, as the upgrade wizard provided for parcels handles the upgrade process almost completely automatically. However, if you wish to continue to use packages, you can perform an upgrade following the instructions presented here.
To upgrade your version of CDH using packages, the steps are as follows.
- Upgrading Unmanaged Components
- Stop All Services
- Back up the HDFS Metadata on the NameNode
- Upgrade Managed Components
- Upgrade the Hive Metastore
- (If Upgrading to CDH 4.2) Upgrade the Oozie Sharelib
- Start All Services
- Configure Cluster CDH Version for Package Installs
- Deploy Client Configurations
Upgrading Unmanaged Components
Upgrading unmanaged components is a process that is separate from upgrading managed components. Upgrade the unmanaged components before proceeding to upgrade managed components. Components that you might have installed that are not managed by Cloudera Manager include:
- Pig
- Whirr
- Mahout
For information on upgrading these unmanaged components, see CDH 4 Installation Guide.
Stop All Services
- Stop each cluster.
- On the Home page, click to the right of the cluster name and select Stop.
- Click Stop in the confirmation screen. The Command Details window shows the progress of stopping services.
When All services successfully stopped appears, the task is complete and you can close the Command Details window.
- Stop the Cloudera Management Service:
- Do one of the following:
-
- Select .
- Select .
-
- On the Home page, click to the right of mgmt and select Stop.
-
- Click Stop to confirm. The Command Details window shows the progress of stopping the roles.
- When Command completed with n/n successful subcommands appears, the task is complete. Click Close.
- Do one of the following:
Back up the HDFS Metadata on the NameNode
- Stop the NameNode you want to back up.
- Go to the HDFS service.
- Select .
- In the Search field, search for "NameNode Data Directories". This locates the NameNode Data Directories property.
- From the command line on the NameNode host, back up the directory listed
in the NameNode Data Directories property. If more than one is listed, then you
only need to make a backup of one directory, since each directory is a complete
copy. For example, if the data directory is /mnt/hadoop/hdfs/name, do the
following as root:
# cd /mnt/hadoop/hdfs/name # tar -cvf /root/nn_backup_data.tar .
You should see output like this:
./ ./current/ ./current/fsimage ./current/fstime ./current/VERSION ./current/edits ./image/ ./image/fsimage
Warning: If you see a file containing the word lock, the NameNode is probably still running. Repeat the preceding steps, starting by shutting down the CDH services.
Upgrade Managed Components
Use one of the following strategies to upgrade CDH 4:- Use your operating system's package management tools to update all packages to the latest version using standard repositories. This approach works well because it minimizes the amount of configuration required and uses the simplest commands. Be aware that this can take a considerable amount of time if you have not upgraded the system recently. To update all packages on your system, use the following command:
Operating System Command RHEL
$ sudo yum update
SLES
$ sudo zypper up
Ubuntu or Debian
$ sudo apt-get upgrade
- Use the cloudera.com
repository that is added during a typical installation, only updating Cloudera
components. This limits the scope of updates to be completed, so the process
takes less time, however this process will not work if you created and used a
custom repository. To install the new version, you can upgrade from Cloudera's
repository by adding an entry to your operating system's package management
configuration file. The repository location varies by operating system:
Operating System Configuration File Repository Entry Red Hat http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/4/ SLES http://archive.cloudera.com/cdh4/sles/11/x86_64/cdh/4/ Debian Squeeze [arch=amd64] http://archive.cloudera.com/cdh4/debian/squeeze squeeze-cdh4 contrib Ubuntu Lucid [arch=amd64] http://archive.cloudera.com/cdh4/ubuntu/lucid/amd64/cdh lucid-cdh4 contrib Ubuntu Precise [arch=amd64] http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh precise-cdh4 contrib For example, under Red Hat, to upgrade from Cloudera's repository you can run commands such as the following on the CDH host to update only CDH:
$ sudo yum clean all $ sudo yum update 'cloudera-*'
Note: -
cloudera-cdh4 is the name of the repository on your
system; the name is usually in square brackets on the first line of
the repo file, in this example /etc/yum.repos.d/cloudera-cdh4.repo:
[chris@ca727 yum.repos.d]$ more cloudera-cdh4.repo [cloudera-cdh4] ...
- yum clean all cleans up yum's cache directories, ensuring that you download and install the latest versions of the packages. – If your system is not up to date, and any underlying system components need to be upgraded before this yum update can succeed, yum will tell you what those are.
On a SLES system, use commands like this to clean cached repository information and then update only the CDH components. For example:
$ sudo zypper clean --all $ sudo zypper up -r http://archive.cloudera.com/cdh4/sles/11/x86_64/cdh/4
To verify the URL, open the Cloudera repo file in /etc/zypp/repos.d on your system (for example /etc/zypp/repos.d/cloudera-cdh4.repo) and look at the line beginning
baseurl=
Use that URL in your sudo zypper up -r command.
On a Debian/Ubuntu system, use commands like this to clean cached repository information and then update only the CDH components. First:
$ sudo apt-get clean
After cleaning the cache, use one of the following upgrade commands to upgrade CDH.
Precise:
$ sudo apt-get upgrade -t precise-cdh4
Lucid:
$ sudo apt-get upgrade -t lucid-cdh4
Squeeze:
$ sudo apt-get upgrade -t squeeze-cdh4
-
cloudera-cdh4 is the name of the repository on your
system; the name is usually in square brackets on the first line of
the repo file, in this example /etc/yum.repos.d/cloudera-cdh4.repo:
- Use a custom repository. This process can be more complicated, but enables updating CDH components for hosts that are not connected to the Internet. You can create your own repository, as described in Understanding Custom Installation Solutions. Creating your own repository is necessary if you are upgrading a cluster that does not have access to the Internet.
If you used a custom repository to complete the installation of your current files and now you want to update using a custom repository, the details of the steps to complete the process are variable. In general, begin by updating any existing custom repository that you will use with the installation files you wish to use. This can be completed in a variety of ways. For example, you might use wget to copy the necessary installation files. Once the installation files have been updated, use the custom repository you established for the initial installation to update CDH.
OS Command RHEL Ensure you have a custom repo that is configured to use your internal repository. For example, if you could have custom repo file in /etc/yum.conf.d/ called cdh_custom.repo in which you specified a local repository. In such a case, you might use the following commands: $ sudo yum clean all $ sudo yum update 'cloudera-*'
SLES Use commands such as the following to clean cached repository information and then update only the CDH components: $ sudo zypper clean --all $ sudo zypper up -r http://internalserver.example.com/path_to_cdh_repo
Ubuntu or Debian Use a command that targets upgrade of your CDH distribution using the custom repository specified in your apt configuration files. These files are typically either the /etc/apt/apt.conf file or in various files in the /etc/apt/apt.conf.d/ directory. Information about your custom repository must be included in the repo files. The general form of entries in Debian/Ubuntu is: deb http://server.example.com/directory/ dist-name pool
For example, the entry for the default repo is:
deb http://us.archive.ubuntu.com/ubuntu/ precise universe
On a Debian/Ubuntu system, use commands such as the following to clean cached repository information and then update only the CDH components:
$ sudo apt-get clean $ sudo apt-get upgrade -t your_cdh_repo
Upgrade the Hive Metastore
If you are upgrading from CDH 4.2 to CDH 4.3 or later, you do not need to perform this step. If you are upgrading from an earlier version of CDH to CDH 4.2 or later, you must do this.
- (Strongly recommended) Make a backup copy of your Hive metastore database.
- Run the metastore upgrade script. The script you run depends on whether you are upgrading to parcels or packages.
- If you are upgrading to packages, the upgrade script is at /usr/lib/hive/scripts/metastore/upgrade/
- If you are upgrading to parcels, then the upgrade script is located at /opt/cloudera/parcels/parcel_name/lib/hive/scripts/metastore/upgrade/database, where parcel_name should be the name of the parcel to which you have upgraded and database is the type of database you are running (that is, mysql, postgres, and so on) For example, if you are installing a CDH 4.2.0 parcel using the default location for the local repository, and using the default database (PostgreSQL) the script will be at: /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10-e16.parcel/lib/hive/scripts/metastore/upgrade/postgres
- You must cd to the directory the scripts are in.
- Execute the script in the appropriate DB command shell. There are multiple scripts in each directory. You must run the one that corresponds to the versions of Hive you are upgrading between. For example, if you are upgrading with MySQL from Hive 0.9 to 0.10, the command would be similar to:
mysql -u hive1 -phive1 hive1 < upgrade-0.9.0-to-0.10.0.mysql.sql
(with the appropriate substitutions for username, etc.).
If your upgrade spans multiple versions of Hive (for example, upgrading from Hive 0.8 to Hive 0.10) you must run all the relevant scripts in the proper order.
Important: You must know the password for the Hive metastore database; if you installed Cloudera Manager using the default (embedded PostgreSQL) database, the password was displayed on the Database Setup page during the Cloudera Manager installation wizard. If you do not know the password for your Hive metastore database, you can find it as follows:- cat /etc/cloudera-scm-server/db.properties This shows you Cloudera Manager's internal database credentials.
- Run the following command:
psql -p 7432 -U cm cm -c "select s.display_name as hive_service_name, s.name as hive_internal_name, c.value as metastore_password from CONFIGS c, SERVICES s where attr='hive_metastore_database_password' and c.service_id = s.service_id"
- Use the password from com.cloudera.cmf.db.password. This will output the passwords for the hive service metastore as follows:
hive_service_name | hive_internal_name | metastore_password -------------------+--------------------+-------------------- hive1 | hive1 | lF3Cv2zsvI (1 row)
- If you have multiple instances of Hive, run the upgrade script(s) on each metastore database.
(If Upgrading to CDH 4.2) Upgrade the Oozie Sharelib
- In the Cloudera Manager Admin Console, select Oozie from the Services tab. The service should already be stopped.
- From the Actions button choose Install Oozie Sharelib. The commands to perform this function are run.
Start All Services
Configure Cluster CDH Version for Package Installs
Because Cloudera Manager does not manage service software installed as packages, during certain upgrade scenarios Cloudera Manager assigns a default CDH version of a cluster. You must manually configure the cluster CDH version to match the package CDH version following the procedure in Configuring the CDH Version for a Cluster in Managing Clusters with Cloudera Manager. If you do not set the cluster CDH version to the package CDH version, Cloudera Manager will incorrectly enable and disable service features based on the configured CDH version.
Deploy Client Configurations
- From the top Actions button that corresponds to the cluster and choose Deploy Client Configuration....
- Click the Deploy Client Configuration button in the confirmation pop-up that appears.
<< Upgrading CDH 4 Using Parcels | Performing a Rolling Upgrade on a Cluster >> | |