Understanding Custom Installation Solutions
Cloudera hosts two types of software repositories that you can use to install products such as Cloudera Manager or CDH—parcel repositories and RHEL and SLES RPM and Debian/Ubuntu package repositories.
- You need to install older product versions. For example, in a CDH cluster, all hosts must run the same CDH version. After completing an initial installation, you may want to add hosts. This could be to increase the size of your cluster to handle larger tasks or to replace older hardware.
- The hosts on which you want to install Cloudera products are not connected to the Internet, so they are unable to reach the Cloudera repository. (For a parcel installation, only the Cloudera Manager Server needs Internet access, but for a package installation, all cluster members need access to the Cloudera repository). Some organizations choose to partition parts of their network from outside access. Isolating segments of a network can provide greater assurance that valuable data is not compromised by individuals out of maliciousness or for personal gain. In such a case, the isolated computers are unable to access Cloudera repositories for new installations or upgrades.
Continue reading:
Understanding Parcels
Parcels are a packaging format that facilitate upgrading software from within Cloudera Manager. You can download, distribute, and activate a new software version all from within Cloudera Manager. Cloudera Manager downloads a parcel to a local directory. Once the parcel is downloaded to the Cloudera Manager Server host, an Internet connection is no longer needed to deploy the parcel. Parcels are available for CDH 4.1.3 and onwards. For detailed information about parcels, see Parcels.
If your Cloudera Manager Server does not have Internet access, you can obtain the required parcel files and put them into a parcel repository. See Creating and Using a Parcel Repository for Cloudera Manager.
Understanding Package Management
- Package management tools
- Package repositories
See Creating and Using a Package Repository for Cloudera Manager.
Package Management Tools
Packages (rpm or deb files) help ensure that installations complete successfully by encoding each package's dependencies. That means that if you request the installation of a solution, all required elements can be installed at the same time. For example, hadoop-0.20-hive depends on hadoop-0.20. Package management tools, such as yum (RHEL), zypper (SLES), and apt-get (Debian/Ubuntu) are tools that can find and install any required packages. For example, for RHEL, you might enter yum install hadoop-0.20-hive. yum would inform you that the hive package requires hadoop-0.20 and offers to complete that installation for you. zypper and apt-get provide similar functionality.
Package Repositories
Package management tools operate on package repositories.
Repository Configuration Files
- RHEL/CentOS yum - /etc/yum.repos.d
- SLES zypper - /etc/zypp/zypper.conf
- Debian/Ubuntu apt-get - /etc/apt/apt.conf (Additional repositories are specified using *.list files in the /etc/apt/sources.list.d/ directory.)
[user@localhost ~]$ ls -l /etc/yum.repos.d/ total 24 -rw-r--r-- 1 root root 2245 Apr 25 2010 CentOS-Base.repo -rw-r--r-- 1 root root 626 Apr 25 2010 CentOS-Media.repo
# ... [base] name=CentOS-$releasever - Base mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os #baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/ gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-5 #released updates [updates] name=CentOS-$releasever - Updates mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates #baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/ gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-5 # ...
Listing Repositories
- RHEL/CentOS - yum repolist
- SLES - zypper repos
- Debian/Ubuntu - apt-get does not include a command to display sources, but you can determine sources by reviewing the contents of /etc/apt/sources.list and any files contained in /etc/apt/sources.list.d/.
[root@localhost yum.repos.d]$ yum repolist Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * addons: mirror.san.fastserv.com * base: centos.eecs.wsu.edu * extras: mirrors.ecvps.com * updates: mirror.5ninesolutions.com repo id repo name status addons CentOS-5 - Addons enabled: 0 base CentOS-5 - Base enabled: 3,434 extras CentOS-5 - Extras enabled: 296 updates CentOS-5 - Updates enabled: 1,137 repolist: 4,867