Installation Path A - Automated Installation by Cloudera Manager (Non-Production Mode)
Before proceeding with this path for a new installation, review Cloudera Manager Deployment. If you are upgrading an existing Cloudera Manager installation, see Cloudera Upgrade.
In Installation Path A, Cloudera Manager automates the installation of the Oracle JDK, Cloudera Manager Server, embedded PostgreSQL database, Cloudera Manager Agent, CDH, and managed service software on cluster hosts. Cloudera Manager also configures databases for the Cloudera Manager Server and Hive Metastore and optionally for Cloudera Management Service roles. This path is recommended for demonstration and proof-of-concept deployments, but is not recommended for production deployments because its not intended to scale and may require database migration as your cluster grows. To use this method, server and cluster hosts must satisfy the following requirements:
- Provide the ability to log in to the Cloudera Manager Server host using a root account or an account that has password-less sudo permission.
- Allow the Cloudera Manager Server host to have uniform SSH access on the same port to all hosts. See Networking and Security Requirements for further information.
- All hosts must have access to standard package repositories and either archive.cloudera.com or a local repository with the required installation files.
The general steps in the procedure for Installation Path A follow.
- Before You Begin
- Download and Run the Cloudera Manager Server Installer
- Start and Log into the Cloudera Manager Admin Console
- Use the Cloudera Manager Wizard for Software Installation and Configuration
- Configure Database Settings
- Review Configuration Changes and Start Services
- Change the Default Administrator Password
- Configure Oozie Data Purge Settings
- Test the Installation
Before You Begin
Install and Configure Databases
By default, Installation Path A installs an embedded PostgreSQL database. You can also choose to configure an external database. Read Cloudera Manager and Managed Service Datastores. If you are using an external database for services or Cloudera Management Service roles, install and configure it following the instructions in External Databases for Oozie Server, Sqoop Server, Activity Monitor, Reports Manager, Hive Metastore Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server.
Perform Configuration Required by Single User Mode
If you are creating a Cloudera Manager deployment that employs single user mode, perform the configuration steps described in Single User Mode Requirements.On CentOS 5 and RHEL 5, Install Python 2.6/2.7 and psycopg2 for Hue
Hue in CDH 5 only works with the operating system's native version of Python when that version is 2.6 and higher.
CentOS/RHEL 5 ships with Python 2.4 so you must install Python 2.6 (or Python 2.7) and the Python-PostgreSQL Database Adapter, psycopg2 (not psycopg).
## Navigate to Hue within your specific CDH parcel version cd /opt/cloudera/parcels/`ls -l /opt/cloudera/parcels | grep CDH | tail -1 | awk '{print $9}'`/lib/hue/build/env/bin ./python2.6 >>>> import psycopg2
or …
cd /opt/cloudera/parcels/`ls -l /opt/cloudera/parcels | grep CDH | tail -1 | awk '{print $9}'`/lib/hue/build/env/lib/python2.6/site-packages/ ln -s /usr/lib64/python2.6/site-packages/psycopg2 psycopg2
Configure an HTTP Proxy
The Cloudera Manager installer accesses archive.cloudera.com by using yum on RHEL systems, zypper on SLES systems, or apt-get on Debian/Ubuntu systems. If your hosts access the Internet through an HTTP proxy, you can configure yum, zypper, or apt-get, system-wide, to access archive.cloudera.com through a proxy. To do so, modify the system configuration on the Cloudera Manager Server host and on every cluster host as follows:
OS | File | Property |
---|---|---|
RHEL-compatible | /etc/yum.conf | proxy=http://server:port/ |
SLES | /root/.curlrc | --proxy=http://server:port/ |
Ubuntu or Debian | /etc/apt/apt.conf | Acquire::http::Proxy "http://server:port"; |
Install the Oracle JDK
If you choose not to have the Oracle JDK installed by Cloudera Manager, install the JDK on all hosts in the cluster according to the following instructions:- CDH 5 - Java Development Kit Installation.
- CDH 4 - Java Development Kit Installation.
Download and Run the Cloudera Manager Server Installer
- Open Cloudera Manager Downloads in a web browser.
- In the Cloudera Manager box, click Download Now.
- Click Download Cloudera Manager to download the most recent version of the installer or click Select a Different
Version to download an earlier version.
The product interest dialog box displays.
- Click Sign in and enter your email address and password or complete the product interest form and click Continue.
The Cloudera Standard License page displays.
- Accept the license agreement and click Submit.
The Automated Installation instructions display. You can also view system requirements, release notes, and you can go to the documentation.
- Download the installer:
$ wget https://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
- Change cloudera-manager-installer.bin to have executable permission:
$ chmod u+x cloudera-manager-installer.bin
- Run the Cloudera Manager Server installer by doing one of the following:
- Install Cloudera Manager packages from the Internet:
$ sudo ./cloudera-manager-installer.bin
- Install Cloudera Manager packages from a local repository:
$ sudo ./cloudera-manager-installer.bin --skip_repo_package=1
- Install Cloudera Manager packages from the Internet:
- Read the Cloudera Manager README and then press Return or Enter to choose Next.
- Read the Cloudera Express License and then press Return or Enter to choose Next. Use the arrow keys and press Return or Enter to choose Yes to confirm you accept the license.
- Read the Oracle Binary Code License Agreement and then press Return or Enter to choose Next.
- Use the arrow keys and press Return or Enter to choose Yes to confirm you accept the Oracle
Binary Code License Agreement. The following occurs:
- The installer installs the Oracle JDK and the Cloudera Manager repository files.
- The installer installs the Cloudera Manager Server and embedded PostgreSQL packages.
- The installer starts the Cloudera Manager Server and embedded PostgreSQL database.
- When the installation completes, the complete URL for the Cloudera Manager Admin Console displays, including the port number, which is 7180 by default. Press Return or Enter to choose OK to continue.
- Press Return or Enter to choose OK to exit the installer.
Start and Log into the Cloudera Manager Admin Console
- Wait several minutes for the Cloudera Manager Server to start. To observe the startup process, run tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log on the Cloudera Manager Server host. If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.
- In a web browser, enter http://Server host:7180, where Server host is the fully qualified domain name or IP address of the host where the Cloudera Manager Server is running.
The login screen for Cloudera Manager Admin Console displays.
- Log into Cloudera Manager Admin Console. The default credentials are: Username: admin Password: admin. Cloudera Manager does not support changing the admin username for the installed account. You can change the password using Cloudera Manager after you run the installation wizard. Although you cannot change the admin username, you can add a new user, assign administrative privileges to the new user, and then delete the default admin account.
- After logging in, the Cloudera Manager End User License Terms and Conditions page displays. Read the terms and conditions and then select Yes to accept them.
- Click Continue.
The Welcome to Cloudera Manager page displays.
Use the Cloudera Manager Wizard for Software Installation and Configuration
The following instructions describe how to use the Cloudera Manager installation wizard to do an initial installation and configuration. The wizard lets you:
- Select the edition of Cloudera Manager to install
- Find the cluster hosts you specify using hostname and IP address ranges
- Connect to each host with SSH to install the Cloudera Manager Agent and other components
- Optionally install the Oracle JDK on the cluster hosts.
- Install CDH and managed service packages or parcels
- Configure CDH and managed services automatically and start the services
Choose Cloudera Manager Edition
From the Welcome to Cloudera Manager page, you can select the edition of Cloudera Manager to install and, optionally, install a license:
- Choose which edition to install:
- Cloudera Express, which does not require a license, but provides a limited set of features.
- Cloudera Enterprise Enterprise Data Hub Edition Trial, which does not require a license, but expires after 60 days and cannot be renewed.
- Cloudera Enterprise with one of the following license types:
- Basic Edition
- Flex Edition
- Enterprise Data Hub Edition
- If you elect Cloudera Enterprise, install a license:
- Click Upload License.
- Click the document icon to the left of the Select a License File text field.
- Go to the location of your license file, click the file, and click Open.
- Click Upload.
- Information is displayed indicating what the CDH installation includes. At this point, you can click the Support drop-down menu to access online Help or the Support Portal.
- Click Continue to proceed with the installation.
Choose Cloudera Manager Hosts
Choose which hosts will run CDH and managed services:
- To enable Cloudera Manager to automatically discover hosts on which to install CDH and managed services, enter the cluster hostnames or IP addresses.
You can also specify hostname and IP address ranges. For example:
Range Definition Matching Hosts 10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4 host[1-3].company.com host1.company.com, host2.company.com, host3.company.com host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com You can specify multiple addresses and address ranges by separating them with commas, semicolons, tabs, or blank spaces, or by placing them on separate lines. Use this technique to make more specific searches instead of searching overly wide ranges. The scan results will include all addresses scanned, but only scans that reach hosts running SSH will be selected for inclusion in your cluster by default. If you do not know the IP addresses of all of the hosts, you can enter an address range that spans over unused addresses and then clear the hosts that do not exist (and are not discovered) later in this procedure. However, keep in mind that wider ranges will require more time to scan.
- Click Search. Cloudera Manager identifies the hosts on your cluster to allow you to configure them for services. If there are a large number of hosts on your cluster, wait a few moments to allow them to be discovered and shown in the wizard. If the search is taking too long, you can stop the scan by clicking Abort Scan. To find additional hosts, click New Search, add the host names or IP addresses and click Search again. Cloudera Manager scans hosts by checking for network connectivity. If there are some hosts where you want to install services that are not shown in the list, make sure you have network connectivity between the Cloudera Manager Server host and those hosts. Common causes of loss of connectivity are firewalls and interference from SELinux.
- Verify that the number of hosts shown matches the number of hosts where you want to install services. Clear host entries that do not exist and clear the hosts where you do not want to install services.
Choose Software Installation Method and Install Software
- Select the repository type to use for the installation. In the Choose Method section select one of the following:
- Use Parcels
- Choose the parcels to install. The choices you see depend on the repositories you have chosen – a repository may contain multiple parcels. Only the
parcels for the latest supported service versions are configured by default.
You can add additional parcels for previous versions by specifying custom repositories. For example, you can find the locations of the previous CDH 4 parcels at https://archive.cloudera.com/cdh4/parcels/. Or, if you are installing CDH 4.3 and want to use policy-file authorization, you can add the Sentry parcel using this mechanism.
- To specify the Parcel Directory or Local Parcel Repository Path, add a parcel repository, or specify the properties
of a proxy server through which parcels are downloaded, click the More Options button and do one or more of the following:
- Parcel Directory and Local Parcel Repository Path - Specify the location of parcels on cluster hosts and the Cloudera Manager Server host.
- Parcel Repository - In the Remote Parcel Repository URLs field, click the button and enter the URL of the repository. The URL you specify is added to the list of repositories listed in the Configuring Cloudera Manager Server Parcel Settings page and a parcel is added to the list of parcels on the Select Repository page. If you have multiple repositories configured, you will see all the unique parcels contained in all your repositories.
- Proxy Server - Specify the properties of a proxy server.
- Click OK. Parcels available from the configured remote parcel repository URLs are displayed in the parcels list. If you specify a URL for a parcel version too new to be supported by the Cloudera Manager version, the parcel is not displayed in the parcel list.
- To specify the Parcel Directory or Local Parcel Repository Path, add a parcel repository, or specify the properties
of a proxy server through which parcels are downloaded, click the More Options button and do one or more of the following:
- Choose the parcels to install. The choices you see depend on the repositories you have chosen – a repository may contain multiple parcels. Only the
parcels for the latest supported service versions are configured by default.
- Use Packages
- Select the major release of CDH to install.
- Select the specific release of CDH to install. You can choose either the latest version, a specific version, or use a custom repository. If you specify a custom repository for a CDH version too new to be supported by the Cloudera Manager version, Cloudera Manager will install the packages but you will not be able to create services using those packages and will have to manually uninstall those packages and manually reinstall packages for a supported CDH version.
- Select the specific releases of other services to install. You can choose either the latest version or use a custom repository. Choose None if you do not want to install that service.
- Use Parcels
- If you are using Cloudera Manager to install software, select the release of Cloudera Manager Agent. You can choose either the version that matches the Cloudera Manager Server you are currently using or specify a version in a custom repository. If you opted to use custom repositories for installation files, you can provide a GPG key URL that applies for all repositories.
- Click Continue.
The Cluster Installation JDK Installation Options screen displays.
- Select Install Oracle Java SE Development Kit (JDK) to allow Cloudera Manager to install the JDK on each cluster host. If you have already installed the JDK, do not select this option. If your local laws permit you to deploy unlimited strength encryption, and you are running a secure cluster, select the Install Java Unlimited Strength Encryption Policy Files checkbox.
- Click Continue.
- (Optional) Select Single User Mode to configure the Cloudera Manager Agent and all service processes to run as the same user. This mode requires extra configuration steps that must be done manually on all hosts in the cluster. If you have not performed the steps, directory creation will fail in the installation wizard. In most cases, you can create the directories but the steps performed by the installation wizard may have to be continued manually. Click Continue.
- Specify host installation properties:
- Select root or enter the username for an account that has password-less sudo permission.
- Select an authentication method:
- If you choose password authentication, enter and confirm the password.
- If you choose public-key authentication, provide a passphrase and path to the required key files.
- You can specify an alternate SSH port. The default value is 22.
- You can specify the maximum number of host installations to run at once. The default value is 10.
- Click Continue. Cloudera Manager performs the following:
- Parcels - installs the Oracle JDK and the Cloudera Manager Agent packages and starts the Agent. Click Continue. During parcel installation, progress is indicated for the phases of the parcel installation process in separate progress bars. If you are installing multiple parcels, you see progress bars for each parcel. When the Continue button at the bottom of the screen turns blue, the installation process is completed.
- Packages - configures package repositories, installs the Oracle JDK, CDH and managed service and the Cloudera Manager Agent packages, and starts the Agent. When the Continue button at the bottom of the screen turns blue, the installation process is completed. If the installation has completed successfully on some hosts but failed on others, you can click Continue if you want to skip installation on the failed hosts and continue to the next screen to start configuring services on the successful hosts.
- Click Continue.
The Host Inspector runs to validate the installation and provides a summary of what it finds, including all the versions of the installed components. If the validation is successful, click Finish.
Add Services
- In the first page of the Add Services wizard, choose the combination of services to install and whether to install Cloudera Navigator:
- Select the combination of services to install:
CDH 4 CDH 5 - Core Hadoop - HDFS, MapReduce, ZooKeeper, Oozie, Hive, and Hue
- Core with HBase
- Core with Impala
- All Services - HDFS, MapReduce, ZooKeeper, HBase, Impala, Oozie, Hive, Hue, and Sqoop
- Custom Services - Any combination of services.
- Core Hadoop - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, and Hue
- Core with HBase
- Core with Impala
- Core with Search
- Core with Spark
- All Services - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, HBase, Impala, Solr, Spark, and Key-Value Store Indexer
- Custom Services - Any combination of services.
- Some services depend on other services; for example, HBase requires HDFS and ZooKeeper. Cloudera Manager tracks dependencies and installs the correct combination of services.
- In a Cloudera Manager deployment of a CDH 4 cluster, the MapReduce service is the default MapReduce computation framework. Choose Custom Services to install YARN, or use the Add Service functionality to add YARN after installation completes.
- In a Cloudera Manager deployment of a CDH 5 cluster, the YARN service is the default MapReduce computation framework. Choose Custom Services to install MapReduce, or use the Add Service functionality to add MapReduce after installation completes.
- The Flume service can be added only after your cluster has been set up.
- If you have chosen Enterprise Data Hub Edition Trial or Cloudera Enterprise, optionally select the Include Cloudera Navigator checkbox to enable Cloudera Navigator. See Cloudera Navigator 2 Overview.
- Select the combination of services to install:
- Click Continue.
- Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the hosts to determine the best hosts for each
role. The wizard assigns all worker roles to the same set of hosts to which the HDFS DataNode role is assigned. You can reassign role instances if necessary.
Click a field below a role to display a dialog box containing a list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts, or Custom to display the pageable hosts dialog box.
The following shortcuts for specifying hostname patterns are supported:- Range of hostnames (without the domain portion)
Range Definition Matching Hosts 10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4 host[1-3].company.com host1.company.com, host2.company.com, host3.company.com host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com - IP addresses
- Rack name
Click the View By Host button for an overview of the role assignment by hostname ranges.
- Range of hostnames (without the domain portion)
- When you are satisfied with the assignments, click Continue.
Configure Database Settings
- Choose the database type:
- Keep the default setting of Use Embedded Database to have Cloudera Manager create and configure required databases. Record the auto-generated
passwords.
- Select Use Custom Databases to specify external database host, enter the database type, database name, username, and password for the database that you created when you set up the database.
- If you are adding the Oozie service, you can change your Oozie configuration to control when data is purged to improve performance, cut down on database disk usage, improve upgrade performance, or to keep the history for a longer period of time. See Configuring Oozie Data Purge Settings Using Cloudera Manager.
- Keep the default setting of Use Embedded Database to have Cloudera Manager create and configure required databases. Record the auto-generated
passwords.
- Click Test Connection to confirm that Cloudera Manager can communicate with the database using the information you have
supplied. If the test succeeds in all cases, click Continue; otherwise, check and correct the information you have provided for the database and then try the test
again. (For some servers, if you are using the embedded database, you will see a message saying the database will be created at a later step in the installation process.)
The Cluster Setup Review Changes screen displays.
Review Configuration Changes and Start Services
- Review the configuration changes to be applied. Confirm the settings entered for file system paths. The file paths required vary based on the services to be installed. If you chose to add the Sqoop service, indicate whether to use the default Derby database or the embedded PostgreSQL database. If the latter, type the database name, host, and user credentials that you specified when you created the database.
- Click Continue.
The wizard starts the services.
- When all of the services are started, click Continue. You see a success message indicating that your cluster has been successfully started.
- Click Finish to proceed to the Cloudera Manager Admin Console Home Page.
Change the Default Administrator Password
- Click the logged-in username at the far right of the top navigation bar and select Change Password.
- Enter the current password and a new password twice, and then click OK.
Configure Oozie Data Purge Settings
If you added an Oozie service, you can change your Oozie configuration to control when data is purged to improve performance, cut down on database disk usage, or to keep the history for a longer period of time. Limiting the size of the Oozie database can also improve performance during upgrades. See Configuring Oozie Data Purge Settings Using Cloudera Manager.
Test the Installation
You can test the installation following the instructions in Testing the Installation.