Installing Cloudera Software Using Tarballs

Before proceeding with this path for a new installation, review Cloudera Installation Guide. If you are upgrading an existing Cloudera Manager installation, see Upgrading Cloudera Manager and CDH.

In this procedure, you install the Oracle JDK, Cloudera Manager Server, and Cloudera Manager Agent software as tarballs and use Cloudera Manager to automate installation of CDH and managed service software as parcels. For a full discussion of deployment options, see Cloudera Installation Guide.

Before You Begin

Install the Oracle JDK

See Step 2: Install Java Development Kit.

Install and Configure External Databases

If you are using an external database for services or Cloudera Management Service roles, install and configure it following the instructions in Step 4: Install and Configure Databases.

Cloudera Manager also requires a database. Prepare the Cloudera Manager Server database as described in Preparing the Cloudera Manager Server Database.

On CentOS 5 and RHEL 5, Install Python 2.6/2.7 and psycopg2 for Hue

Hue in CDH 5 only works with the operating system's native version of Python when that version is 2.6 and higher.

CentOS/RHEL 5 ships with Python 2.4 so you must install Python 2.6 (or Python 2.7) and the Python-PostgreSQL Database Adapter, psycopg2 (not psycopg).

If the Hue server is already installed, you must import the psycopg2 connector into Hue's environment or create a symbolic link.
## Navigate to Hue within your specific CDH parcel version
cd /opt/cloudera/parcels/`ls -l /opt/cloudera/parcels | grep CDH | tail -1 | awk '{print $9}'`/lib/hue/build/env/bin
./python2.6
>>>> import psycopg2

or …

cd /opt/cloudera/parcels/`ls -l /opt/cloudera/parcels | grep CDH | tail -1 | awk '{print $9}'`/lib/hue/build/env/lib/python2.6/site-packages/
ln -s /usr/lib64/python2.6/site-packages/psycopg2 psycopg2

Install the Cloudera Manager Server and Agents

To install the Cloudera Manager Server and Agents, you download and extract tarballs, create users, and configure the server and agents.

Download and Extract Tarballs

Tarballs contain the Cloudera Manager Server and Cloudera Manager Agent in a single file.
  1. Download tarballs from the locations listed in Cloudera Manager Version and Download Information.
  2. Copy the tarballs and unpack them on all hosts on which you intend to install Cloudera Manager Server and Cloudera Manager Agents, in a directory you choose. You can create a new directory to accommodate the files you extract from the tarball. For example, if /opt/cloudera-manager does not exist, create it using a command similar to:
    $ sudo mkdir /opt/cloudera-manager
  3. Extract the contents of the tarball to the selected directory. For example, to copy a tar file to your home directory and extract the contents of all tar files to the /opt/ directory, use a command similar to the following:
    $ sudo tar xzf cloudera-manager*.tar.gz -C /opt/cloudera-manager

The files are extracted to a subdirectory named according to the Cloudera Manager version being extracted. For example, files could be extracted to /opt/cloudera-manager/cm-5.0/. This full path is required later and is referred to as $CMF_DEFAULTS directory.

Perform Configuration Required by Single User Mode

If you are creating a Cloudera Manager deployment that employs single user mode, perform the configuration steps described in Configuring Single User Mode.

Create Users

The Cloudera Manager Server and managed services require a user account. When installing Cloudera Manager from tarballs, you must create this user account on all hosts manually. Because Cloudera Manager Server and managed services are configured to use tcloudera-scm by default, creating a user with this name is the simplest approach. This created user is used automatically after installation is complete.

To create user cloudera-scm, use a command such as the following:
$ sudo useradd --system --home=/opt/cloudera-manager/cm-5.6.0/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm
Ensure that the --home argument path matches your environment. This argument varies according to where you place the tarball, and the version number varies among releases. For example, the --home location could be /opt/cm-5.6.0/run/cloudera-scm-server.

Create the Cloudera Manager Server Local Data Storage Directory

  1. Create the following directory: /var/lib/cloudera-scm-server.
  2. Change the owner of the directory so that the cloudera-scm user and group have ownership of the directory. For example:
    $ sudo mkdir /var/lib/cloudera-scm-server
    $ sudo chown cloudera-scm:cloudera-scm /var/lib/cloudera-scm-server

Configure Cloudera Manager Agents

On every Cloudera Manager Agent host, configure the Cloudera Manager Agent to point to the Cloudera Manager Server. Use the CMF_DEFAULTS environment variable in the environment of the user running Cloudera Manager Server and Agent. In $CMF_DEFAULTS/cloudera-scm-agent/config.ini, set the following environment variables:
Property Description
server_host Name of the host where Cloudera Manager Server is running.
server_port Port on the host where Cloudera Manager Server is running.
By default, a tarball installation has a var subdirectory where state is stored. In a non-tarball installation, state is stored in /var. Cloudera recommends that you reconfigure the tarball installation to use an external directory as the /var equivalent (/var or any other directory outside the tarball) so that when you upgrade Cloudera Manager, the new tarball installation can access this state. Configure the installation to use an external directory for storing state by editing $CMF_DEFAULTS/etc/default/cloudera-scm-agent and setting the CMF_VAR variable to the location of the /var equivalent. If you do not reuse the state directory between different tarball installations, duplicate Cloudera Manager Agent entries can occur in the Cloudera Manager database.

Configuring for a Custom Cloudera Manager User and Custom Directories

You can change the default username and directories used by Cloudera Manager. If you do not change the default, skip to Step 4: Install and Configure Databases. By default, Cloudera Manager creates the following directories in /var/log and /var/lib:
  • /var/log/cloudera-scm-headlamp
  • /var/log/cloudera-scm-firehose
  • /var/log/cloudera-scm-alertpublisher
  • /var/log/cloudera-scm-eventserver
  • /var/lib/cloudera-scm-headlamp
  • /var/lib/cloudera-scm-firehose
  • /var/lib/cloudera-scm-alertpublisher
  • /var/lib/cloudera-scm-eventserver
  • /var/lib/cloudera-scm-server
If you are using a custom username and custom directories for Cloudera Manager, you must create these directories on the Cloudera Manager Server host and assign ownership of these directories to the custom username. Cloudera Manager installer makes no changes to any directories that already exist. Cloudera Manager cannot write to any existing directories for which it does not have proper permissions, and if you do not change ownership, Cloudera Management Service roles may not perform as expected. To resolve these issues, do one of the following:
  • Change ownership of existing directories:
    Use the chown command to change ownership of all existing directories to the Cloudera Manager user. If the Cloudera Manager username and group are cloudera-scm, to change the ownership of the headlamp log directory, issue a command similar to the following:
    $ sudo chown -R cloudera-scm:cloudera-scm /var/log/cloudera-scm-headlamp
  • Use alternate directories:
    1. If the directories you plan to use do not exist, create them. For example, to create /var/cm_logs/cloudera-scm-headlamp for use by the cloudera-scm user, run the following commands:
      mkdir /var/cm_logs/cloudera-scm-headlamp
      chown cloudera-scm /var/cm_logs/cloudera-scm-headlamp
    2. Connect to the Cloudera Manager Admin Console.
    3. Select Clusters > Cloudera Management Service
    4. Select Scope > role name.
    5. Click the Configuration tab.
    6. Enter a term in the Search field to find the settings to be changed. For example, you can enter /var or directory.
    7. Update each value with the new locations for Cloudera Manager to use.
    8. Click Save Changes to commit the changes.

Create Parcel Directories

  1. On the Cloudera Manager Server host, create a parcel repository directory:
    $ sudo mkdir -p /opt/cloudera/parcel-repo
  2. Change the directory ownership to be the username you are using to run Cloudera Manager:
    $ sudo chown username:groupname /opt/cloudera/parcel-repo
    where username and groupname are the user and group names (respectively) you are using to run Cloudera Manager. For example, if you use the default username cloudera-scm, you would run the command:
    $ sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo
  3. On each cluster host, create a parcels directory:
    $ sudo mkdir -p /opt/cloudera/parcels
  4. Change the directory ownership to be the username you are using to run Cloudera Manager:
    $ sudo chown username:groupname /opt/cloudera/parcels
    where username and groupname are the user and group names (respectively) you are using to run Cloudera Manager. For example, if you use the default username cloudera-scm, you would run the command:
    $ sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcels

Start the Cloudera Manager Server

The way in which you start the Cloudera Manager Server varies according to which account you want the Server to run under:
  • As root:
    $ sudo $CMF_DEFAULTS/etc/init.d/cloudera-scm-server start 
  • As another user. If you run as another user, ensure that the user you created for Cloudera Manager owns the location to which you extracted the tarball including the newly created database files. If you followed the earlier examples and created the directory /opt/cloudera-manager and the user cloudera-scm, you could use the following command to change ownership of the directory:
    $ sudo chown -R cloudera-scm:cloudera-scm /opt/cloudera-manager

    Once you have established ownership of directory locations, you can start Cloudera Manager Server using the user account you chose. For example, you might run the Cloudera Manager Server as cloudera-service. In this case, you have the following options:

    • Run the following command:
      $ sudo -u cloudera-service $CMF_DEFAULTS/etc/init.d/cloudera-scm-server start 
    • Edit the configuration files so the script internally changes the user, and then run the script as root:
      1. Remove the following line from $CMF_DEFAULTS/etc/default/cloudera-scm-server:
        export CMF_SUDO_CMD=" "
      2. Change the user and group in $CMF_DEFAULTS/etc/init.d/cloudera-scm-server to the user you want the server to run as. For example, to run as cloudera-service, change the user and group as follows:
        USER=cloudera-service
        GROUP=cloudera-service
      3. Run the server script as root:
        $ sudo $CMF_DEFAULTS/etc/init.d/cloudera-scm-server start 
  • To start the Cloudera Manager Server automatically after a reboot:
    1. Run the following commands on the Cloudera Manager Server host:
      • RHEL-compatible and SLES
        $ cp $CMF_DEFAULTS/etc/init.d/cloudera-scm-server /etc/init.d/cloudera-scm-server
        $ chkconfig cloudera-scm-server on
      • Debian/Ubuntu
        $ cp $CMF_DEFAULTS/etc/init.d/cloudera-scm-server /etc/init.d/cloudera-scm-server
        $ update-rc.d cloudera-scm-server defaults
    2. On the Cloudera Manager Server host, open the /etc/init.d/cloudera-scm-server file and change the value of CMF_DEFAULTS from ${CMF_DEFAULTS:-/etc/default} to $CMF_DEFAULTS/etc/default.

If the Cloudera Manager Server does not start, see Troubleshooting Installation Problems.

Start the Cloudera Manager Agents

Start the Cloudera Manager Agent according to the account you want the Agent to run under:
  • To start the Cloudera Manager Agent, run this command on each Agent host:
    $ sudo $CMF_DEFAULTS/etc/init.d/cloudera-scm-agent start
    When the Agent starts, it contacts the Cloudera Manager Server.
  • If you are running single user mode, start Cloudera Manager Agent using the user account you chose. For example, to run the Cloudera Manager Agent as cloudera-scm, you have the following options:
    • Run the following command:
      $ sudo -u cloudera-scm $CMF_DEFAULTS/etc/init.d/cloudera-scm-agent start 
    • Edit the configuration files so the script internally changes the user, and then run the script as root:
      1. Remove the following line from $CMF_DEFAULTS/etc/default/cloudera-scm-agent:
        export CMF_SUDO_CMD=" "
      2. Change the user and group in $CMF_DEFAULTS/etc/init.d/cloudera-scm-agent to the user you want the Agent to run as. For example, to run as cloudera-scm, change the user and group as follows:
        USER=cloudera-scm
        GROUP=cloudera-scm
      3. Run the Agent script as root:
        $ sudo $CMF_DEFAULTS/etc/init.d/cloudera-scm-agent start 
  • To start the Cloudera Manager Agents automatically after a reboot:
    1. Run the following commands on each Agent host:
      • RHEL-compatible and SLES
        $ cp $CMF_DEFAULTS/etc/init.d/cloudera-scm-agent /etc/init.d/cloudera-scm-agent
        $ chkconfig cloudera-scm-agent on
      • Debian/Ubuntu
        $ cp $CMF_DEFAULTS/etc/init.d/cloudera-scm-agent /etc/init.d/cloudera-scm-agent
        $ update-rc.d cloudera-scm-agent defaults
    2. On each Agent, open the $CMF_DEFAULTS/etc/init.d/cloudera-scm-agent file and change the value of CMF_DEFAULTS from ${CMF_DEFAULTS:-/etc/default} to $CMF_DEFAULTS/etc/default.

Install Package Dependencies

When you install with tarballs and parcels, some services may require additional dependencies that are not provided by Cloudera. On each host, install the required packages:

When you install with tarballs and parcels, some services may require additional dependencies that are not provided by Cloudera. On each host, install the required packages, as determined by running the commands on this page: Package Dependencies.

Start and Log into the Cloudera Manager Admin Console

The Cloudera Manager Server URL takes the following form http://Server host:port, where Server host is the fully qualified domain name (FQDN) or IP address of the host where the Cloudera Manager Server is installed, and port is the port configured for the Cloudera Manager Server. The default port is 7180.
  1. Wait several minutes for the Cloudera Manager Server to start. To observe the startup process, run tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log on the Cloudera Manager Server host. If the Cloudera Manager Server does not start, see Troubleshooting Installation Problems.
  2. In a web browser, enter http://Server host:7180, where Server host is the FQDN or IP address of the host where the Cloudera Manager Server is running.

    The login screen for Cloudera Manager Admin Console displays.

  3. Log into Cloudera Manager Admin Console. The default credentials are: Username: admin Password: admin. Cloudera Manager does not support changing the admin username for the installed account. You can change the password using Cloudera Manager after you run the installation wizard. Although you cannot change the admin username, you can add a new user, assign administrative privileges to the new user, and then delete the default admin account.
  4. After you log in, the Cloudera Manager End User License Terms and Conditions page displays. Read the terms and conditions and then select Yes to accept them.
  5. Click Continue.

    The Welcome to Cloudera Manager page displays.

Welcome to Cloudera Manager

From the Welcome to Cloudera Manager page, you can select the edition of Cloudera Manager to install and, optionally, install a license:

  1. Choose which edition to install:
    • Cloudera Express, which does not require a license, but provides a limited set of features.
    • Cloudera Enterprise Cloudera Enterprise Trial, which does not require a license, but expires after 60 days and cannot be renewed.
    • Cloudera Enterprise with one of the following license types:
      • Basic Edition
      • Flex Edition
      • Cloudera Enterprise
    If you choose Cloudera Express or Cloudera Enterprise Cloudera Enterprise Trial, you can upgrade the license at a later time. For more information, see Managing Licenses.
  2. If you select Cloudera Enterprise, install a license:
    1. Click the Select License File text field.
    2. Browse to the location of your license file, select the file, and then click Open.
    3. Click the Upload button.
  3. Information is displayed indicating what the CDH installation includes. At this point, you can click the Support drop-down menu to access online Help or the Support Portal.
  4. Click Continue to proceed with the installation.

Thank you for choosing Cloudera Manager and CDH

The Thank you for choosing Cloudera Manager and CDH page lists the software that is available to be installed. Click Continue to proceed with the installation.

Specify hosts for your CDH cluster installation

Choose which hosts will run CDH and managed services

  1. Do one of the following depending on whether you are using Cloudera Manager to install software:
    • If you are using Cloudera Manager to install software, search for and choose hosts:
      1. To enable Cloudera Manager to automatically discover hosts on which to install CDH and managed services, enter the cluster hostnames or IP addresses. You can also specify hostname and IP address ranges. For example:
        Range Definition Matching Hosts
        10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4
        host[1-3].example.com host1.example.com, host2.example.com, host3.example.com
        host[07-10].example.com host07.example.com, host08.example.com, host09.example.com, host10.example.com

        You can specify multiple addresses and address ranges by separating them with commas, semicolons, tabs, or blank spaces, or by placing them on separate lines. Use this technique to make more specific searches instead of searching overly wide ranges. The scan results will include all addresses scanned, but only scans that reach hosts running SSH will be selected for inclusion in your cluster by default. If you do not know the IP addresses of all of the hosts, you can enter an address range that spans over unused addresses and then clear the hosts that do not exist (and are not discovered) later in this procedure. However, keep in mind that wider ranges will require more time to scan.

      2. Click Search. Cloudera Manager identifies the hosts on your cluster to allow you to configure them for services. If there are a large number of hosts on your cluster, wait a few moments to allow them to be discovered and shown in the wizard. If the search is taking too long, you can stop the scan by clicking Abort Scan. To find additional hosts, click New Search, add the host names or IP addresses and click Search again. Cloudera Manager scans hosts by checking for network connectivity. If there are some hosts where you want to install services that are not shown in the list, make sure you have network connectivity between the Cloudera Manager Server host and those hosts. Common causes of loss of connectivity are firewalls and interference from SELinux.
      3. Verify that the number of hosts shown matches the number of hosts where you want to install services. Clear host entries that do not exist and clear the hosts where you do not want to install services.
  2. Click Continue.

The Select Repository screen displays.

Select Repository

  1. Select the repository type to use for the installation. In the Choose Method section select one of the following:
    • Use Parcels (Recommended)

      A parcel is a binary distribution format containing the program files, along with additional metadata used by Cloudera Manager. Parcels are required for rolling upgrades. For more information, see Parcels.

    • Use Packages

      A package is a standard binary distribution format that contains compiled code and meta-information such as a package description, version, and dependencies. Packages are installed using your operating system package manager.

  2. Select the version of CDH to install.
    1. If you selected Use Parcels (Recommended) and you do not see the version you want to install, click the More Options button to add the repository URL for your version. Repository URLs for CDH 5 are documented in CDH Download Information. After adding the repository, click Save Changes and wait a few seconds for the version to appear. If your Cloudera Manager host uses an HTTP proxy, click the Proxy Settings button to configure your proxy.
    2. If you selected Use Packages, and the version you want to install is not listed, you can select Custom Repository to specify a repository that contains the desired version. Repository URLs for CDH 5 version are documented in CDH Download Information.
  3. If you selected Use Parcels (Recommended), specify any Additional Parcels you want to install.
  4. Select the specific release of the Cloudera Manager Agent you want to install on your hosts. You can choose either the version that matches the Cloudera Manager Server you are currently using (recommended) or specify a different version in a custom repository. If you opted to use custom repositories for installation files, you can provide a GPG key URL that applies for all repositories. Repository and GPG key URLs are documented in CDH Download Information.
  5. Click Continue.

The Accept JDK License screen displays.

Accept JDK License

To allow Cloudera Manager to automatically install the Oracle JDK on cluster hosts, read the JDK license and check the box labeled Install Oracle Java SE Development Kit (JDK 7) if you accept the terms. If you installed your own Oracle JDK version in Step 2: Install Java Development Kit, leave the box unchecked.

If you allow Cloudera Manager to install the JDK, a second checkbox appears, labeled Install Java Unlimited Strength Encryption Policy Files. These policy files are required to enable AES-256 encryption in JDK versions lower than 1.8u161. JDK 1.8u161 and higher enable unlimited strength encryption by default, and do not require policy files.

After reading the license terms and checking the applicable boxes, click Continue.

The Single User Mode page displays.

Single User Mode

The Single User Mode page allows you to enable single user mode. Single user mode is not recommended for most environments. If you need to enable single user mode, check the box labeled Enable Single User Mode. Otherwise, leave it unchecked, and click Continue. The Enter Login Credentials page displays.

Enter Login Credentials

  1. Select root for the root account, or select Another user and enter the username for an account that has password-less sudo privileges.
  2. Select an authentication method:
    • If you choose password authentication, enter and confirm the password.
    • If you choose public-key authentication, provide a passphrase and path to the required key files.

    You can modify the default SSH port if necessary.

  3. Specify the maximum number of host installations to run at once. The default and recommended value is 10. You can adjust this based on your network capacity.
  4. Click Continue.

Inspect hosts for correctness

The Inspect Hosts page runs the Host Inspector to search for common configuration problems. View the results and address any identified problems. Click the Run Again button to update the results after making any changes.

After addressing identified problems, click Finish.

This completes the Cluster Installation wizard and launches the Cluster Setup wizard.

Select Services

The Select Services page allows you to select the services you want to install and configure. Make sure that you have the appropriate license key for the services you want to use. You can choose from:

  • Core Hadoop

    HDFS, YARN (MapReduce 2 Included), ZooKeeper, Oozie, Hive, and Hue

  • Core with HBase

    HDFS, YARN (MapReduce 2 Included), ZooKeeper, Oozie, Hive, Hue, and HBase

  • Core with Impala

    HDFS, YARN (MapReduce 2 Included), ZooKeeper, Oozie, Hive, Hue, and Impala

  • Core with Search

    HDFS, YARN (MapReduce 2 Included), ZooKeeper, Oozie, Hive, Hue, and Solr

  • Core with Spark

    HDFS, YARN (MapReduce 2 Included), ZooKeeper, Oozie, Hive, Hue, and Spark

  • All Services

    HDFS, YARN (MapReduce 2 Included), ZooKeeper, Oozie, Hive, Hue, HBase, Impala, Solr, Spark, and Key-Value Store Indexer

  • Custom Services

    Choose your own services. Services required by chosen services will automatically be included. Flume can be added after your initial cluster has been set up.

To include Cloudera Navigator data management, check the box labeled Include Cloudera Navigator.

After selecting the services you want to add, click Continue. The Assign Roles page displays.

Assign Roles

The Assign Roles page suggests role assignments for the hosts in your cluster. You can click on the hostname for a role to select a different host. You can also click the View By Host button to see all the roles assigned to a host.

To review the recommended role assignments, see Recommended Cluster Hosts and Role Distribution.

After assigning all of the roles for your services, click Continue. The Setup Database page displays.

Setup Database

On the Setup Database page, you can enter the database names, usernames, and passwords you created in Step 4: Install and Configure Databases.

Select the database type and enter the database name, username, and password for each service. Click Test Connection to validate the settings. If the connection is successful, a green checkmark and the word Successful appears next to each service. If there are any problems, the error is reported next to the service that failed to connect.

After verifying that each connection is successful, click Continue. The Review Changes page displays.

Review Changes

The Review Changes page lists default and suggested settings for several configuration parameters, including data directories.

Review and make any necessary changes, and then click Continue. The First Run Command page displays.

First Run Command

The First Run Command page lists the details of the First Run command. You can expand the running commands to view the details of any step, including log files and command output. You can filter the view by selecting Show All Steps, Show Only Failed Steps, or Show Running Steps.

After the First Run command completes, click Continue.

Congratulations!

The Congratulations! page reports the success of the setup wizard. Click Finish to complete the wizard. The installation is complete.

Cloudera recommends that you change the default password as soon as possible by clicking the logged-in username at the top right of the home screen and clicking Change Password.

(Optional) Change the Cloudera Manager User

After configuring your services, the installation wizard automatically starts the Cloudera Management Service, assuming that it runs using cloudera-scm. If you configured this service to run using a user other than cloudera-scm, the Cloudera Management Service roles do not start automatically. To change the service configuration to use the user account that you selected:
  1. Connect to the Cloudera Manager Admin Console.
  2. Do one of the following:
    • Select Clusters > Cloudera Management Service.
    • On the Home > Status tab, in Cloudera Management Service table, click the Cloudera Management Service link.
  3. Click the Configuration tab.
  4. Use the search box to find the property to change. For example, you might enter "system" to find the System User and System Group properties.
  5. Make any changes required to the System User and System Group to ensure Cloudera Manager uses the proper user accounts.
  6. Click Save Changes.
  7. Start the Cloudera Management Service roles.

Change the Default Administrator Password

As soon as possible, change the default administrator password:
  1. Click the logged-in username at the far right of the top navigation bar and select Change Password.
  2. Enter the current password and a new password twice, and then click OK.

Configure Oozie Data Purge Settings

If you added an Oozie service, you can change your Oozie configuration to control when data is purged to improve performance, cut down on database disk usage, or to keep the history for a longer period of time. Limiting the size of the Oozie database can also improve performance during upgrades. See Configuring Oozie Data Purge Settings Using Cloudera Manager.

Test the Installation

You can test the installation following the instructions in Testing the Installation.