Installation Path B - Automated Installation by Cloudera Manager
The steps in this topic first install Cloudera Manager and then use Cloudera Manager to install agents, CDH software, managed service software, configure and start your cluster.
You can also install CDH, the agents and other software manually. See Installation Path B - Manual Installation Using Cloudera Manager Packages or Installation Path C - Manual Installation Using Cloudera Manager Tarballs.
Before proceeding with this path for a new installation, review Cloudera Manager Deployment. If you are upgrading a Cloudera Manager existing installation, see Upgrading Cloudera Manager.
To install the Cloudera Manager Server using packages, follow the instructions in this section. You can also use Puppet or Chef to install the packages. The general steps in the procedure for Installation Path B follow.
- Before You Begin
- Establish Your Cloudera Manager Repository Strategy
- Install the Oracle JDK
- Install the Cloudera Manager Server Packages
- Set up a Database for the Cloudera Manager Server
- Start the Cloudera Manager Server
- Start and Log into the Cloudera Manager Admin Console
- Choose Cloudera Manager Edition and Hosts
- Choose the Software Installation Type and Install Software
- Add Services
- Change the Default Administrator Password
- Configure Oozie Data Purge Settings
- Test the Installation
During Cloudera Manager installation you can choose to install CDH and managed service as parcels or packages. For packages, you can choose to have Cloudera Manager install the packages or install them yourself.
Before You Begin
Perform Configuration Required by Single User Mode
If you are creating a Cloudera Manager deployment that employs single user mode, perform the configuration steps described in Single User Mode Requirements.$ su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm' ... $ yum install python26
Install and Configure Databases
Read Cloudera Manager and Managed Service Datastores. If you are using an external database, install and configure a database as described in MariaDB Database, MySQL Database, Oracle Database, or External PostgreSQL Database.
Establish Your Cloudera Manager Repository Strategy
Cloudera recommends installing products using package management tools such as yum for RHEL compatible systems, zypper for SLES, and apt-get for Debian/Ubuntu. These tools depend on access to repositories to install software. For example, Cloudera maintains Internet-accessible repositories for CDH and Cloudera Manager installation files. Strategies for installing Cloudera Manager include:
- Standard Cloudera repositories. For this method, ensure you have added the required repository information to your systems. For Cloudera Manager repository locations and client repository files, see Cloudera Manager Version and Download Information.
- Internally hosted repositories. You might use internal repositories for environments where hosts do not have access to the Internet. For information about preparing your environment, see Understanding Custom Installation Solutions. When using an internal repository, you must copy the repo or list file to the Cloudera Manager Server host and update the repository properties to point to internal repository URLs.
RHEL-compatible
- Save the appropriate Cloudera Manager repo file (cloudera-manager.repo) for your system:
OS Version Repo URL RHEL/CentOS/Oracle 5 https://archive.cloudera.com/cm5/redhat/5/x86_64/cm/cloudera-manager.repo RHEL/CentOS 6 https://archive.cloudera.com/cm5/redhat/6/x86_64/cm/cloudera-manager.repo RHEL/CentOS 7 https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera-manager.repo - Copy the repo file to the /etc/yum.repos.d/ directory.
SLES
- Run the following command:
$ sudo zypper addrepo -f https://archive.cloudera.com/cm5/sles/11/x86_64/cm/cloudera-manager.repo
- Update your system package index by running:
$ sudo zypper refresh
Ubuntu or Debian
- Save the appropriate Cloudera Manager list file (cloudera.list) for your system:
OS Version Repo URL Ubuntu Trusty (14.04) https://archive.cloudera.com/cm5/ubuntu/trusty/amd64/cm/cloudera.list Ubuntu Precise (12.04) https://archive.cloudera.com/cm5/ubuntu/precise/amd64/cm/cloudera.list Ubuntu Lucid (10.04) https://archive.cloudera.com/cm5/ubuntu/lucid/amd64/cm/cloudera.list Debian Wheezy (7.0 and 7.1) https://archive.cloudera.com/cm5/debian/wheezy/amd64/cm/cloudera.list Debian Squeeze (6.0) http://archive.cloudera.com/cm5/debian/squeeze/amd64/cm/cloudera.list - Copy the content of that file to the cloudera-manager.list file in the /etc/apt/sources.list.d/ directory.
- Update your system package index by running:
$ sudo apt-get update
Install the Oracle JDK
Install the Oracle Java Development Kit (JDK) on the Cloudera Manager Server host.
OS | Command |
---|---|
RHEL |
$ sudo yum install oracle-j2sdk1.7 |
SLES |
$ sudo zypper install oracle-j2sdk1.7 |
Ubuntu or Debian |
$ sudo apt-get install oracle-j2sdk1.7 |
Install the Cloudera Manager Server Packages
- Install the Cloudera Manager Server packages either on the host where the database is installed, or on a host that has access to the database. This host need not be a host in the
cluster that you want to manage with Cloudera Manager. On the Cloudera Manager Server host, type the following commands to install the Cloudera Manager packages.
OS Command RHEL, if you have a yum repo configured $ sudo yum install cloudera-manager-daemons cloudera-manager-server
RHEL,if you're manually transferring RPMs $ sudo yum --nogpgcheck localinstall cloudera-manager-daemons-*.rpm $ sudo yum --nogpgcheck localinstall cloudera-manager-server-*.rpm
SLES $ sudo zypper install cloudera-manager-daemons cloudera-manager-server
Ubuntu or Debian $ sudo apt-get install cloudera-manager-daemons cloudera-manager-server
- If you choose an Oracle database for use with Cloudera Manager, edit the /etc/default/cloudera-scm-server file on the Cloudera Manager server host. Locate the line that begins with export CM_JAVA_OPTS and change the -Xmx2G option to -Xmx4G.
Set up a Database for the Cloudera Manager Server
- External database - Prepare the Cloudera Manager Server database as described in Preparing a Cloudera Manager Server External Database.
- Embedded database - Install an embedded PostgreSQL database as described in Installing and Starting the Cloudera Manager Server Embedded Database.
Start the Cloudera Manager Server
- Run this command on the Cloudera Manager Server host:
$ sudo service cloudera-scm-server start
If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.
Start and Log into the Cloudera Manager Admin Console
- Wait several minutes for the Cloudera Manager Server to complete its startup. To observe the startup process, run tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log on the Cloudera Manager Server host. If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.
- In a web browser, enter http://Server host:7180, where Server host is the fully qualified domain name or IP address of the host where the Cloudera Manager Server is running. The login screen for Cloudera Manager Admin Console displays.
- Log into Cloudera Manager Admin Console. The default credentials are: Username: admin Password: admin. Cloudera Manager does not support changing the admin username for the installed account. You can change the password using Cloudera Manager after you run the installation wizard. Although you cannot change the admin username, you can add a new user, assign administrative privileges to the new user, and then delete the default admin account.
- After logging in, the Cloudera Manager End User License Terms and Conditions page displays. Read the terms and conditions and then select Yes to accept them.
- Click Continue.
Choose Cloudera Manager Edition and Hosts
Choose which edition of Cloudera Manager you are using and which hosts will run CDH and managed services.
- When you start the Cloudera Manager Admin Console, the install wizard starts up. Click Continue to get started.
- Choose which edition to install:
- Cloudera Express, which does not require a license, but provides a limited set of features.
- Cloudera Enterprise Data Hub Edition Trial, which does not require a license, but expires after 60 days and cannot be renewed.
- Cloudera Enterprise with one of the following license types:
- Basic Edition
- Flex Edition
- Data Hub Edition
- If you elect Cloudera Enterprise, install a license:
- Click Upload License.
- Click the document icon to the left of the Select a License File text field.
- Go to the location of your license file, click the file, and click Open.
- Click Upload.
- Information is displayed indicating what the CDH installation includes. At this point, you can access online Help or the Support Portal. Click Continue to proceed with the installation.
- Do one of the following depending on whether you are using Cloudera Manager to install software:
- If you are using Cloudera Manager to install software, search for and choose hosts:
- To enable Cloudera Manager to automatically discover hosts on which to install CDH and managed services, enter the cluster hostnames or IP addresses. You can also specify hostname and
IP address ranges. For example:
Range Definition Matching Hosts 10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4 host[1-3].company.com host1.company.com, host2.company.com, host3.company.com host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com You can specify multiple addresses and address ranges by separating them by commas, semicolons, tabs, or blank spaces, or by placing them on separate lines. Use this technique to make more specific searches instead of searching overly wide ranges. The scan results will include all addresses scanned, but only scans that reach hosts running SSH will be selected for inclusion in your cluster by default. If you don't know the IP addresses of all of the hosts, you can enter an address range that spans over unused addresses and then deselect the hosts that do not exist (and are not discovered) later in this procedure. However, keep in mind that wider ranges will require more time to scan.
- Click Search. Cloudera Manager identifies the hosts on your cluster to allow you to configure them for services. If there are a large number of hosts on your cluster, wait a few moments to allow them to be discovered and shown in the wizard. If the search is taking too long, you can stop the scan by clicking Abort Scan. To find additional hosts, click New Search, add the host names or IP addresses and click Search again. Cloudera Manager scans hosts by checking for network connectivity. If there are some hosts where you want to install services that are not shown in the list, make sure you have network connectivity between the Cloudera Manager Server host and those hosts. Common causes of loss of connectivity are firewalls and interference from SELinux.
- Verify that the number of hosts shown matches the number of hosts where you want to install services. Deselect host entries that do not exist and deselect the hosts where you do not want to install services. Click Continue. The Select Repository screen displays.
- To enable Cloudera Manager to automatically discover hosts on which to install CDH and managed services, enter the cluster hostnames or IP addresses. You can also specify hostname and
IP address ranges. For example:
- If you installed Cloudera Agent packages in Install Cloudera Manager Agent Packages, choose
from among hosts with the packages installed:
- Click the Currently Managed Hosts tab.
- Choose the hosts to add to the cluster.
- If you are using Cloudera Manager to install software, search for and choose hosts:
- Click Continue.
Choose the Software Installation Type and Install Software
Choose a software installation type (parcels or packages) and install the software if not previously installed.
- Choose the software installation type and CDH and managed service version:
- Use Parcels
- Choose the parcels to install. The choices depend on the repositories you have chosen; a repository can contain multiple parcels. Only the parcels for
the latest supported service versions are configured by default.
You can add additional parcels for previous versions by specifying custom repositories. For example, you can find the locations of the previous CDH 4 parcels at https://archive.cloudera.com/cdh4/parcels/. Or, if you are installing CDH 4.3 and want to use policy-file authorization, you can add the Sentry parcel using this mechanism.
- To specify the parcel directory, specify the local parcel repository, add a parcel repository, or specify the properties of a proxy server through which
parcels are downloaded, click the More Options button and do one or more of the following:
- Parcel Directory and Local Parcel Repository Path - Specify the location of parcels on
cluster hosts and the Cloudera Manager Server host. If you change the default value for Parcel Directory and have already installed and started Cloudera Manager Agents,
restart the Agents:
$ sudo service cloudera-scm-agent restart
- Parcel Repository - In the Remote Parcel Repository URLs field, click the button and enter the URL of the repository. The URL you specify is added to the list of repositories listed in the Configuring Cloudera Manager Server Parcel Settings page and a parcel is added to the list of parcels on the Select Repository page. If you have multiple repositories configured, you see all the unique parcels contained in all your repositories.
- Proxy Server - Specify the properties of a proxy server.
- Parcel Directory and Local Parcel Repository Path - Specify the location of parcels on
cluster hosts and the Cloudera Manager Server host. If you change the default value for Parcel Directory and have already installed and started Cloudera Manager Agents,
restart the Agents:
- Click OK.
- To specify the parcel directory, specify the local parcel repository, add a parcel repository, or specify the properties of a proxy server through which
parcels are downloaded, click the More Options button and do one or more of the following:
- Select the release of Cloudera Manager Agent. You can choose either the version that matches the Cloudera Manager Server you are currently using or specify a version in a custom repository. If you opted to use custom repositories for installation files, you can provide a GPG key URL that applies for all repositories. Click Continue.
- Choose the parcels to install. The choices depend on the repositories you have chosen; a repository can contain multiple parcels. Only the parcels for
the latest supported service versions are configured by default.
- Use Packages - Do one of the following:
- If Cloudera Manager is installing the packages:
- Click the package version.
- Select the release of Cloudera Manager Agent. You can choose either the version that matches the Cloudera Manager Server you are currently using or specify a version in a custom repository. If you opted to use custom repositories for installation files, you can provide a GPG key URL that applies for all repositories. Click Continue.
- If you manually installed packages in Install CDH and Managed Service Packages, select the CDH version (CDH 4 or CDH 5) that matches the packages you installed manually.
- If Cloudera Manager is installing the packages:
- Use Parcels
- Select the Install Oracle Java SE Development Kit (JDK) checkbox to allow Cloudera Manager to install the JDK on each cluster host or leave deselected if you installed it. If checked, your local laws permit you to deploy unlimited strength encryption, and you are running a secure cluster, select the Install Java Unlimited Strength Encryption Policy Files checkbox. Click Continue.
- (Optional) Select Single User Mode to configure the Cloudera Manager Agent and all service processes to run as the same user. This mode requires extra configuration steps that must be done manually on all hosts in the cluster. If you have not performed the steps, directory creation will fail in the installation wizard. In most cases, you can create the directories but the steps performed by the installation wizard may have to be continued manually. Click Continue.
- If you chose to have Cloudera Manager install software, specify host installation properties:
- Select root or enter the user name for an account that has password-less sudo permission.
- Select an authentication method:
- If you choose password authentication, enter and confirm the password.
- If you choose public-key authentication, provide a passphrase and path to the required key files.
- You can specify an alternate SSH port. The default value is 22.
- You can specify the maximum number of host installations to run at once. The default value is 10.
- Click Continue. If you chose to have Cloudera Manager install software, Cloudera Manager installs the Oracle JDK, Cloudera Manager Agent, packages and CDH and managed service parcels or packages. During parcel installation, progress is indicated for the phases of the parcel installation process in separate progress bars. If you are installing multiple parcels, you see progress bars for each parcel. When the Continue button at the bottom of the screen turns blue, the installation process is completed.
- Click Continue. The Host Inspector runs to validate the installation and provides a summary of what it finds, including all the versions of the installed components. If the validation is successful, click Finish.
Add Services
Use the Cloudera Manager wizard to configure and start CDH and managed services.
- In the first page of the Add Services wizard, choose the combination of services to install and whether to install Cloudera Navigator:
- Click the radio button next to the combination of services to install:
CDH 4 CDH 5 - Core Hadoop - HDFS, MapReduce, ZooKeeper, Oozie, Hive, and Hue
- Core with HBase
- Core with Impala
- All Services - HDFS, MapReduce, ZooKeeper, HBase, Impala, Oozie, Hive, Hue, and Sqoop
- Custom Services - Any combination of services.
- Core Hadoop - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, and Hue
- Core with HBase
- Core with Impala
- Core with Search
- Core with Spark
- All Services - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, HBase, Impala, Solr, Spark, and Key-Value Store Indexer
- Custom Services - Any combination of services.
- Some services depend on other services; for example, HBase requires HDFS and ZooKeeper. Cloudera Manager tracks dependencies and installs the correct combination of services.
- In a Cloudera Manager deployment of a CDH 4 cluster, the MapReduce service is the default MapReduce computation framework. Choose Custom Services to install YARN, or use the Add Service functionality to add YARN after installation completes.
- In a Cloudera Manager deployment of a CDH 5 cluster, the YARN service is the default MapReduce computation framework. Choose Custom Services to install MapReduce, or use the Add Service functionality to add MapReduce after installation completes.
- The Flume service can be added only after your cluster has been set up.
- If you have chosen Data Hub Edition Trial or Cloudera Enterprise, optionally select the Include Cloudera Navigator checkbox to enable Cloudera Navigator. See the Cloudera Navigator Documentation.
- Click the radio button next to the combination of services to install:
- Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the hosts to determine the best hosts for each role. The wizard assigns all
worker roles to the same set of hosts to which the HDFS DataNode role is assigned. You can reassign role instances if necessary.
Click a field below a role to display a dialog containing a list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts, or Custom to display the pageable hosts dialog.
The following shortcuts for specifying hostname patterns are supported:- Range of hostnames (without the domain portion)
Range Definition Matching Hosts 10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4 host[1-3].company.com host1.company.com, host2.company.com, host3.company.com host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com - IP addresses
- Rack name
Click the View By Host button for an overview of the role assignment by hostname ranges.
- Range of hostnames (without the domain portion)
- When you are satisfied with the assignments, click Continue.
- On the Database Setup page, configure settings for required databases:
- Enter the database host, database type, database name, username, and password for the database that you created when you set up the database.
- Click Test Connection to confirm that Cloudera Manager can communicate with the database using the information you have supplied. If the test succeeds in all cases, click Continue; otherwise, check and correct the information you have provided for the database and then try the test again. (For some servers, if you are using the embedded database, you will see a message saying the database will be created at a later step in the installation process.) The Review Changes screen displays.
- Review the configuration changes to be applied. Confirm the settings entered for file system paths. The file paths required vary based on the services to be installed. If you chose to add the Sqoop service, indicate whether to use the default Derby database or the embedded PostgreSQL database. If the latter, type the database name, host, and user credentials that you specified when you created the database. Click Continue. The wizard starts the services.
- When all of the services are started, click Continue. You see a success message indicating that your cluster has been successfully started.
- Click Finish to proceed to the Cloudera Manager Admin Console Home Page.
Change the Default Administrator Password
- Right-click the logged-in username at the far right of the top navigation bar and select Change Password.
- Enter the current password and a new password twice, and then click Update.
Configure Oozie Data Purge Settings
If you added an Oozie service, you can change your Oozie configuration to control when data is purged in order to improve performance, cut down on database disk usage, or to keep the history for a longer period of time. Limiting the size of the Oozie database can also improve performance during upgrades. See Configuring Oozie Data Purge Settings Using Cloudera Manager.
Test the Installation
You can test the installation following the instructions in Testing the Installation.