HDP uses Yum or Zypper to install software, and this software is obtained from the: HDP Repositories, and the Extra Packages for Enterprise Linux (EPEL) repository. EPEL repository is used for RHEL/CentOS platforms.
If your firewall prevents Internet access, it will be necessary to mirror and/or proxy both the HDP repository and the Extra Packages for Enterprise Linux (EPEL) repository. Many Data Centers already mirror or proxy the EPEL repository, so discuss with your Data Center team whether EPEL is already available from within your firewall.
Mirroring a repository involves copying the entire repository and all its contents
onto a local server and enabling an HTTPD service on that server to serve the repository
locally. Once the local mirror server setup is complete, the *.repo
configuration files
on every repository client (i.e. cluster nodes) must be updated, so that the given
package names are associated with the local mirror server instead of the remote
repository server.
There are three options for creating a local mirror server. Each of these options is explained in detail in a later section.
Option I: Mirror server has no access to Internet at all
Use a web browser on your workstation to download the HDP Repository Tarball, move the tarball to the selected mirror server using scp or an USB drive, and extract it to create the repository on the local mirror server.
Option II: Mirror server has temporary access to Internet
Temporarily configure a server to have Internet access, download a copy of the HDP Repository to this server using the reposync command, then reconfigure the server so that it is back behind the firewall.
Option III: Mirror server has permanent access to Internet (modified form of Option II)
Establish a “trusted host”, by permanently configuring a server to have Internet access, but still be accessible from within the firewall. Download a copy of the HDP Repository to this server using the reposync command.
Note Option I is probably the least effort, and in some respects, is the most secure deployment option.
Option III is best if you want to be able to update your Hadoop installation periodically from the Hortonworks Repositories.
However, if you are considering Option III, you should also consider the fourth option, which is to proxy the HDP Repositories through a trusted proxy server. If you have a network administrator who has expertise in setting up proxies, and if the proxy option is acceptable within your Data Center Security policies, this can be the easiest of all the options.
Option IV: Trusted proxy server
Proxying a repository involves setting up a standard HTTP proxy on a local server to forward repository access requests to the remote repository server and route responses back to the original requestor. Effectively, the proxy server makes the repository server accessible to all clients, by acting as an intermediary.
Once the proxy is configured, change the
/etc/yum.conf
file on every repository client (i.e. cluster nodes), so that when the client attempts to access the repository during installation, the request will go through the local proxy server instead of going directly to the remote repository server.