Creating a Local Yum Repository

This section explains how to set up a local yum repository to install CDH on the machines in your cluster. There are a number of reasons you might want to do this, for example:

  • The computers in your cluster may not have Internet access. You can still use yum to do an installation on those machines by creating a local yum repository.
  • You may want to keep a stable local repository to ensure that any new installations (or re-installations on existing cluster members) use exactly the same bits.
  • Using a local repository may be the most efficient way to distribute the software to the cluster members.

To set up your own internal mirror, follow the steps below. You need an internet connection for the steps that require you to download packages and create the repository itself. You will also need an internet connection in order to download updated RPMs to your local repository.

  1. Click the entry in the table below that matches your RHEL or CentOS system, navigate to the repo file for your system and save it in the /etc/yum.repos.d/ directory.

    For OS Version

    Click this Link

    RHEL/CentOS/Oracle 5

    RHEL/CentOS/Oracle 5 link

    RHEL/CentOS/Oracle 6 (64-bit)

    RHEL/CentOS/Oracle 6 link

  2. Install a web server such as apache/lighttpd on the machine which will serve the RPMs. The default configuration should work. HTTP access must be allowed to pass through any firewalls between this server and the internet connection.
  3. On the server with the web server,, install the yum-utils and createrepo RPM packages if they are not already installed. The yum-utils package includes the reposync command, which is required to create the local Yum repository.
    sudo yum install yum-utils createrepo
  4. On the same computer as in the previous steps, download the yum repository into a temporary location. On RHEL/CentOS 6, you can use a command such as:
    reposync -r cloudera-cdh5 

    You can replace with any alpha-numeric string. It will be the name of your local repository, used in the header of the repo file other systems will use to connect to your repository. You can now disconnect your server from the internet.

  5. Put all the RPMs into a directory served by your web server, such as /var/www/html/cdh/5/RPMS/noarch/ (or x86_64 or i386 instead of noarch). The directory structure 5/RPMS/noarch is required. Make sure you can remotely access the files in the directory using HTTP, using a URL similar to http://<yourwebserver>/cdh/5/RPMS/).
  6. On your web server, issue the following command from the 5/ subdirectory of your RPM directory:
    createrepo .

    This will create or update the metadata required by the yum command to recognize the directory as a repository. The command creates a new directory called repodata. If necessary, adjust the permissions of files and directories in your entire repository directory to be readable by the web server user.

  7. Edit the repo file you downloaded in step 1 and replace the line starting with baseurl= or mirrorlist= with baseurl=http://<yourwebserver>/cdh/5/, using the URL from step 5. Save the file back to /etc/yum.repos.d/.
  8. While disconnected from the internet, issue the following commands to install CDH from your local yum repository.

Example:

yum update
yum install hadoop

Once you have confirmed that your internal mirror works, you can distribute this modified repo file to any system which can connect to your repository server. Those systems can now install CDH from your local repository without internet access. Follow the instructions under Installing the Latest CDH 5 Release, starting at Step 2 (you have already done Step 1).