Creating a Local Yum Repository

This section explains how to set up a local yum repository to install CDH on the machines in your cluster. There are a number of reasons you might want to do this, for example:

  • The machines in your cluster do not have Internet access. You can still use yum to do an installation on those machines by creating a local yum repository.
  • You may want to keep a stable local repository to ensure that any new installations (or re-installations on existing cluster members) use exactly the same bits.
  • Using a local repository may be the most efficient way to distribute the software to the cluster members.

To set up your own internal mirror, follow the steps below. You need an Internet connection for the steps that require you to download packages and create the repository itself. You also need an Internet connection to download updated RPMs to your local repository.

  1. Download the repo file. Click the link for your RHEL or CentOS system in the table, find the appropriate repo file, and save in /etc/yum.repos.d/.

    For OS Version

    Link to CDH 5 Repository

    RHEL/CentOS/Oracle 5

    RHEL/CentOS/Oracle 5 link

    RHEL/CentOS/Oracle 6

    RHEL/CentOS/Oracle 6 link

    RHEL/CentOS/Oracle 7

    RHEL/CentOS/Oracle 7 link

  2. Install a web server such as apache/lighttpd on the machine that hosts the RPMs. The default configuration should work. HTTP access must be allowed to pass through any firewalls between this server and the Internet connection.
  3. On the server with the web server, install the RPM packages, yum-utils and createrepo, if not already installed. The yum-utils package includes the reposync command, which is required to create the local Yum repository.
    sudo yum install yum-utils createrepo
  4. On the same computer as in the previous steps, download the yum repository into a temporary location. On RHEL/CentOS 6, you can use a command such as:
    reposync -r cloudera-cdh5 

    You can replace with any alpha-numeric string. It will be the name of your local repository, used in the header of the repo file other systems use to connect to your repository. You can now disconnect your server from the Internet.

  5. Put all the RPMs into a directory served by your web server, such as /var/www/html/cdh/5/RPMS/noarch/ (or x86_64 or i386 instead of noarch). The directory structure 5/RPMS/noarch is required. Make sure you can remotely access the files in the directory using HTTP, using a URL similar to http://<yourwebserver>/cdh/5/RPMS/).
  6. On your web server, issue the following command from the 5/ subdirectory of your RPM directory:
    createrepo .

    This creates or update the metadata required by the yum command to recognize the directory as a repository. The command creates a new directory called repodata. If necessary, adjust the permissions of files and directories in your entire repository directory to be readable by the web server user.

  7. Edit the repo file you downloaded in step 1 and replace the line starting with baseurl= or mirrorlist= with baseurl=http://<yourwebserver>/cdh/5/, using the URL from step 5. Save the file back to /etc/yum.repos.d/.
  8. While disconnected from the Internet, issue the following commands to install CDH from your local yum repository.

Example:

yum update
yum install hadoop

Once you have confirmed that your internal mirror works, you can distribute this modified repo file to any system which can connect to your repository server. Those systems can now install CDH from your local repository without Internet access. Follow the instructions under Installing the Latest CDH 5 Release, starting at Step 2 (you have already done Step 1).