Installing CDH 5
To upgrade to the latest CDH 5 release, use the following topics.
Ways To Install CDH 5
You can install CDH 5 in any of the following ways:
- Automated method using Cloudera Manager; instructions here. Cloudera Manager
automates the installation and configuration of CDH 5 on an entire cluster if you have
root or password-less sudo SSH access to
your cluster's machines. Note
: Cloudera recommends that you use the automated method if possible. - Manual methods described below:
- Download and install the CDH 5 1-click Install" package
- Add the CDH 5 repository
- Build your own CDH 5 repository
If you use one of these methods rather than Cloudera Manager, the first of these methods (downloading and installing the "1-click Install" package) is recommended in most cases because it is simpler than building or adding a repository.
- Install from a CDH 5 tarball — see, the next topic, "How Packaging Affects CDH 5 Deployment".
How Packaging Affects CDH 5 Deployment
Installing from Packages
- To install and deploy YARN, follow the directions on this page and proceed with Deploying MapReduce v2 (YARN) on a Cluster.
- To install and deploy MRv1, follow the directions on this page and then proceed with Deploying MapReduce v1 (MRv1) on a Cluster.
Installing from a Tarball
- If you install CDH 5 from a tarball, you will install YARN.
- In CDH 5, there is no separate tarball for MRv1. Instead, the MRv1 binaries, examples, etc., are delivered in the Hadoop tarball itself. The scripts for running MRv1 are in the bin-mapreduce1 directory in the tarball, and the MRv1 examples are in the examples-mapreduce1 directory.
Before You Begin Installing CDH 5 Manually
- The instructions on this page are for new installations. If you need to upgrade from an earlier release, see Upgrading from CDH 4 to CDH 5.
- For a list of supported operating systems, see CDH 5 Requirements and Supported Versions.
- These instructions assume that the sudo command is configured on the hosts where you will be doing the installation. If this is not the case, you will need the root user (superuser) to configure it.
If you are migrating from MapReduce v1 (MRv1) to MapReduce v2 (MRv2, YARN), see Migrating from MapReduce v1 (MRv1) to MapReduce v2 (MRv2, YARN) for important information and instructions.
When starting, stopping and restarting CDH components, always use the service (8) command rather than running scripts in /etc/init.d directly. This is important because service sets the current working directory to / and removes most environment variables (passing only LANG and TERM) so as to create a predictable environment in which to administer the service. If you run the scripts in /etc/init.d, any environment variables you have set remain in force, and could produce unpredictable results. (If you install CDH from packages, service will be installed as part of the Linux Standard Base (LSB).)
- Java Development Kit: if you have not already done so, install the Oracle Java Development Kit (JDK); see Java Development Kit Installation.
- Scheduler defaults: note the
following differences between MRv1 and MRv2 (YARN).
- MRv1:
- Cloudera Manager sets the default to FIFO.
- CDH 5 sets the default to FIFO, with FIFO, Fair Scheduler, and Capacity Scheduler on the classpath by default.
- MRv2 (YARN):
- Cloudera Manager sets the default to Fair Scheduler.
- CDH 5 sets the default to Fair Scheduler, with FIFO and Fair Scheduler on the classpath by default.
- YARN does not support Capacity Scheduler.
- MRv1:
High Availability
- For more information and instructions on setting up a new HA configuration, see the
CDH 5 High Availability Guide.Important
: If you decide to configure HA for the NameNode, do not install hadoop-hdfs-secondarynamenode. After completing the HDFS HA software configuration, follow the installation instructions under Deploying HDFS High Availability.
- To upgrade an existing configuration, follow the instructions under Upgrading to CDH 5.
Steps to Install CDH 5 Manually
Step 1: Add or Build the CDH 5 Repository or Download the "1-click Install" package.
- If you are installing CDH 5 on a Red Hat system, you can download Cloudera packages using yum or your web browser.
- If you are installing CDH 5 on a SLES system, you can download the Cloudera packages using zypper or YaST or your web browser.
- If you are installing CDH 5 on an Ubuntu or Debian system, you can download the Cloudera packages using apt or your web browser.
On Red Hat-compatible Systems
Use only one of the three methods.
- Download and install the CDH 5 "1-click Install" package OR
- Add the CDH 5 repository OR
- Build a Yum Repository
Do this on all the systems in the cluster.
To download and install the CDH 5 "1-click Install" package:
- Click the entry in the table below that matches your Red Hat or
CentOS system, choose Save File, and save the file
to a directory to which you have write access (it can be your home directory).
OS Version Click this Link Red Hat/CentOS/Oracle 5 Red Hat/CentOS/Oracle 5 link Red Hat/CentOS/Oracle 6 Red Hat/CentOS/Oracle 6 link - Install the RPM. For Red Hat/CentOS/Oracle 5:
$ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
For Red Hat/CentOS/Oracle 6 (64-bit):
$ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
Now continue with Step 1a: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.
OR: To add the CDH 5 repository:
Click the entry in the table below that matches your Red Hat or CentOS system, navigate to the repo file for your system and save it in the /etc/yum.repos.d/ directory.
For OS Version |
Click this Link |
---|---|
Red Hat/CentOS/Oracle 5 |
|
Red Hat/CentOS/Oracle 6 (64-bit) |
Now continue with Step 1a: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.
OR: To build a Yum repository:
If you want to create your own yum repository, download the appropriate repo file, create the repo, distribute the repo file and set up a web server, as described under Creating a Local Yum Repository.
sudo yum clean all
On SLES Systems
Use only one of the three methods.
- Download and install the CDH 5 "1-click Install" PackageOR
- Add the CDH 5 repositoryOR
- Build a SLES Repository
To download and install the CDH 5 "1-click Install" package:
- Download the CDH 5 "1-click Install" package.
Click this link, choose Save File, and save it to a directory to which you have write access (it can be your home directory).
- Install the RPM:
$ sudo rpm -i cloudera-cdh-5-0.x86_64.rpm
- Update your system package index by running:
$ sudo zypper refresh
Now continue with Step 2: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.
OR: To add the CDH 5 repository:
- Run the following command:
$ sudo zypper addrepo -f http://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/cloudera-cdh5.repo
- Update your system package index by running:
$ sudo zypper refresh
Now continue with Step 1a: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.
OR: To build a SLES repository:
If you want to create your own SLES repository, create a mirror of the CDH SLES directory by following these instructions that explain how to create a SLES repository from the mirror.
sudo zypper clean --all
On Ubuntu or Debian Systems
Use only one of the three methods.
- Download and install the CDH 5 "1-click Install" Package OR
- Add the CDH 5 repositoryOR
- Build a Debian Repository
To download and install the CDH 5 "1-click Install" package:
- Download the CDH 5 "1-click Install" package:
OS Version Click this Link Wheezy Wheezy link Precise Precise link - Install the package. Do one of the following:
- Choose Open with in the download window to use the package manager.
- Choose Save File, save the package to a directory to which you have write access (it can be your home directory) and install it from the command line, for example:
sudo dpkg -i cdh5-repository_1.0_all.deb
Now continue with Step 2: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.
OR: To add the CDH 5 repository:
Create a new file /etc/apt/sources.list.d/cloudera.list with the following contents:
- For Ubuntu systems:
deb [arch=amd64] http://archive.cloudera.com/cdh5/<OS-release-arch><RELEASE>-cdh5 contrib deb-src http://archive.cloudera.com/cdh5/<OS-release-arch><RELEASE>-cdh5 contrib
- For Debian systems:
deb http://archive.cloudera.com/cdh5/<OS-release-arch><RELEASE>-cdh5 contrib deb-src http://archive.cloudera.com/cdh5/<OS-release-arch><RELEASE>-cdh5 contrib
where: <OS-release-arch> is debian/wheezy/amd64/cdh or ubuntu/precise/amd64/cdh, and <RELEASE> is the name of your distribution, which you can find by running lsb_release -c.
For example, to install CDH 5 for 64-bit Ubuntu Precise:
deb [arch=amd64] http://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh precise-cdh5 contrib deb-src http://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh precise-cdh5 contrib
Now continue with Step 1a: Optionally Add a Repository Key, and then choose Step 3: Install CDH 5 with YARN, or Step 4: Install CDH 5 with MRv1; or do both steps if you want to install both implementations.
OR: To build a Debian repository:
If you want to create your own apt repository, create a mirror of the CDH Debian directory and then create an apt repository from the mirror.
sudo apt-get update
Step 2: Optionally Add a Repository Key
Before installing YARN or MRv1: (Optionally) add a repository key on each system in the cluster. Add the Cloudera Public GPG Key to your repository by executing one of the following commands:
- For Red Hat/CentOS/Oracle 5
systems:
$ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
- For Red Hat/CentOS/Oracle 6
systems:
$ sudo rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
- For all SLES systems:
$ sudo rpm --import http://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/RPM-GPG-KEY-cloudera
- For Ubuntu Precise systems:
$ curl -s http://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key | sudo apt-key add -
- For Debian Wheezy systems:
$ curl -s http://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/archive.key | sudo apt-key add -
This key enables you to verify that you are downloading genuine packages.
Step 3: Install CDH 5 with YARN
Skip this step if you intend to use only MRv1. Directions for installing MRv1 are in Step 3.
To install CDH 5 with YARN:
If you decide to configure HA for the NameNode, do not install hadoop-hdfs-secondarynamenode. After completing the HA software configuration, follow the installation instructions under Deploying HDFS High Availability.
- Install and deploy ZooKeeper. Important
: Cloudera recommends that you install (or update) and start a ZooKeeper cluster before proceeding. This is a requirement if you are deploying high availability (HA) for the NameNode.
Follow instructions under ZooKeeper Installation.
- Install each type of daemon package on the appropriate systems(s),
as follows.
Where to install
Install commands
Resource Manager host (analogous to MRv1 JobTracker) running:
Red Hat/CentOS compatible
sudo yum clean all; sudo yum install hadoop-yarn-resourcemanager
SLES
sudo zypper clean --all; sudo zypper install hadoop-yarn-resourcemanager
Ubuntu or Debian
sudo apt-get update; sudo apt-get install hadoop-yarn-resourcemanager
NameNode host running:
Red Hat/CentOS compatible
sudo yum clean all; sudo yum install hadoop-hdfs-namenode
SLES
sudo zypper clean --all; sudo zypper install hadoop-hdfs-namenode
Ubuntu or Debian
sudo apt-get install hadoop-hdfs-namenode
Secondary NameNode host (if used) running:
Red Hat/CentOS compatible
sudo yum clean all; sudo yum install hadoop-hdfs-secondarynamenode
SLES
sudo zypper clean --all; sudo zypper install hadoop-hdfs-secondarynamenode
Ubuntu or Debian
sudo apt-get install hadoop-hdfs-secondarynamenode
All cluster hosts except the Resource Manager running:
Red Hat/CentOS compatible
sudo yum clean all; sudo yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
SLES
sudo zypper clean --all; sudo zypper install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
Ubuntu or Debian
sudo apt-get install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
One host in the cluster running:
Red Hat/CentOS compatible
sudo yum clean all; sudo yum install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver
SLES
sudo zypper clean --all; sudo zypper install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver
Ubuntu or Debian
sudo apt-get install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver
All client hosts running:
Red Hat/CentOS compatible
sudo yum clean all; sudo yum install hadoop-client
SLES
sudo zypper clean --all; sudo zypper install hadoop-client
Ubuntu or Debian
sudo apt-get install hadoop-client
The hadoop-yarn and hadoop-hdfs packages are installed on each system automatically as dependencies of the other packages.
Step 4: Install CDH 5 with MRv1
If you are also installing YARN, you can skip any packages you have already installed in Step 3: Install CDH 5 with YARN.
Skip this step and go to Step 3: Install CDH 5 with YARN if you intend to use only YARN.
Before proceeding, you need to decide:
- Whether to configure High Availability (HA) for the NameNode and/or JobTracker; see the CDH 5 High Availability Guide for more information and instructions.
- Where to deploy the NameNode, Secondary NameNode, and JobTracker
daemons. As a general rule:
- The NameNode and JobTracker run on the same "master" host unless the cluster is large (more than a few tens of nodes), and the master host (or hosts) should not run the Secondary NameNode (if used), DataNode or TaskTracker services.
- In a large cluster, it is especially important that the Secondary NameNode (if used) runs on a separate machine from the NameNode.
- Each node in the cluster except the master host(s) should run the DataNode and TaskTracker services.
If you decide to configure HA for the NameNode, do not install hadoop-hdfs-secondarynamenode. After completing the HA software configuration, follow the installation instructions under Deploying HDFS High Availability.
- Install and deploy ZooKeeper. Important
: Cloudera recommends that you install (or update) and start a ZooKeeper cluster before proceeding. This is a requirement if you are deploying high availability (HA) for the NameNode or JobTracker.
Follow instructions under ZooKeeper Installation.
- Install each type of daemon package on the appropriate systems(s),
as follows.
Where to install
Install commands
JobTracker host running:
Red Hat/CentOS compatible
sudo yum clean all; sudo yum install hadoop-0.20-mapreduce-jobtracker
SLES
sudo zypper clean --all; sudo zypper install hadoop-0.20-mapreduce-jobtracker
Ubuntu or Debian
sudo apt-get update; sudo apt-get install hadoop-0.20-mapreduce-jobtracker
NameNode host running:
Red Hat/CentOS compatible
sudo yum clean all; sudo yum install hadoop-hdfs-namenode
SLES
sudo zypper clean --all; sudo zypper install hadoop-hdfs-namenode
Ubuntu or Debian
sudo apt-get install hadoop-hdfs-namenode
Secondary NameNode host (if used) running:
Red Hat/CentOS compatible
sudo yum clean all; sudo yum install hadoop-hdfs-secondarynamenode
SLES
sudo zypper clean --all; sudo zypper install hadoop-hdfs-secondarynamenode
Ubuntu or Debian
sudo apt-get install hadoop-hdfs-secondarynamenode
All cluster hosts except the JobTracker, NameNode, and Secondary (or Standby) NameNode hosts running:
Red Hat/CentOS compatible
sudo yum clean all; sudo yum install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode
SLES
sudo zypper clean --all; sudo zypper install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode
Ubuntu or Debian
sudo apt-get install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode
All client hosts running:
Red Hat/CentOS compatible
sudo yum clean all; sudo yum install hadoop-client
SLES
sudo zypper clean --all; sudo zypper install hadoop-client
Ubuntu or Debian
sudo apt-get install hadoop-client
Step 5: (Optional) Install LZO
yum remove hadoop-lzo
- Add the repository on each host in the cluster.
Follow the instructions for your OS version:
For OS Version Do this Red Hat/CentOS/Oracle 5 Navigate to this link and save the file in the /etc/yum.repos.d/ directory. Red Hat/CentOS 6 Navigate to this link and save the file in the /etc/yum.repos.d/ directory. SLES - Run the following
command:
$ sudo zypper addrepo -f http://archive.cloudera.com/gplextras5/sles/11/x86_64/gplextras/ cloudera-gplextras5.repo
- Update your system package index by
running:
$ sudo zypper refresh
Ubuntu or Debian Navigate to this link and save the file as /etc/apt/sources.list.d/gplextras.list. Important: Make sure you do not let the file name default to cloudera.list, as that will overwrite your existing cloudera.list. - Run the following
command:
- Install the package on each host as follows:
For OS version Install commands Red Hat/CentOS compatible sudo yum install hadoop-lzo
SLES sudo zypper install hadoop-lzo
Ubuntu or Debian sudo apt-get install hadoop-lzo
- Continue with installing and deploying CDH. As part of the
deployment, you will need to do some additional configuration for LZO, as shown under
Configuring LZO . Important
: Make sure you do this configuration after you have copied the default configuration files to a custom location and set alternatives to point to it.
Step 6: Deploy CDH and Install Components
Now proceed with:
<< CDH 5 Installation | Installing CDH 5 Components >> | |