Using Whirr to Launch Cloudera Manager
Cloudera Manager provides an installation wizard that installs Cloudera Manager, CDH and Impala on a cluster of Amazon Web Services (AWS) EC2 instances. See Installing Cloudera Manager and CDH on EC2 . Alternatively, you can install Cloudera Manager using Whirr following the instructions here. Follow these instructions to start a cluster on Amazon Elastic Compute Cloud (EC2) running Cloudera Manager.
This method uses Whirr to start a cluster with:
- One host running the Cloudera Manager Admin Console
- A user-selectable number of hosts for the Hadoop cluster itself.
Once Whirr has started the cluster, you use Cloudera Manager in the usual way.
Step 1: Set your AWS credentials as environment variables
Run the following commands from your local host:
$ export AWS_ACCESS_KEY_ID=... $ export AWS_SECRET_ACCESS_KEY=...
Step 2: Install Whirr
Install CDH repositories and the whirr package. For CDH 4, see the CDH 4 Installation Guide. For CDH 5, see the CDH 5 Installation Guide.
Create environment variables:
$ export WHIRR_HOME=/usr/lib/whirr $ export PATH=$WHIRR_HOME/bin:$PATH
Step 3: Create a password-less SSH Key Pair
Create a password-less SSH Key Pair for Whirr to use:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa_cm
Step 4: Get your Whirr-Cloudera-Manager Configuration
You can download a sample Whirr EC2 Cloudera Manager configuration as follows:
$ curl -O https://raw.github.com/cloudera/whirr-cm/master/cm-ec2.properties
To upload a Cloudera Manager License as part of the installation (Cloudera can provide this if you do not have one), place the license in a file cm-license.txt on the Whirr classpath (for example in $WHIRR_HOME/conf), using a command such as the following:
$ mv -v eval_acme_20120925_cloudera_enterprise_license.txt $WHIRR_HOME/conf/cm-license.txt
To upload a Cloudera Manager configuration as part of the installation, place the configuration in a file called cm-config.json on the Whirr classpath (for example in $WHIRR_HOME/conf). The format of this file should match the JSON as downloaded from the Cloudera Manager UI. For example:
$ curl -O https://raw.github.com/cloudera/whirr-cm/master/cm-config.json $ mv -v cm-config.json $WHIRR_HOME/conf/cm-config.json
Step 5: Launch a Cloudera Manager Cluster
The following command starts a cluster with five Hadoop hosts:
$ whirr launch-cluster --config cm-ec2.properties
- To change the number of hosts edit the whirr.instance-templates line in the cm-ec2.properties file. For example, to launch a cluster with 20 hosts: whirr.instance-templates=1 cmserver,20 cmagent
- To add a no-op host to use as gateway host: whirr.instance-templates=1 cmserver,20 cmagent,1 noop
Whirr reports progress to the console as it runs. The command exits when the cluster is ready to be used.
Using the Cluster
Once the Hadoop cluster is up and running you can run jobs from any Cloudera Manager Agent host, or from a Cloudera Manager gateway host.
Using a Gateway Host (Optional)
In most cases, you will not a need a gateway host, but you may want to consider using one if you want to run jobs on a host that is not also running CDH TaskTracker and DataNode processes. In that case, edit whirr.instance-templates to use the noop option shown in the previous section, launch the cluster, and then follow Cloudera Manager instructions to add a gateway role on the no-op host, which you can find in the documentation for your version of Cloudera Manager, for example at Role Instances.
Then SSH to the gateway host. Now you can interact with the cluster; for example, to list files in HDFS:
hadoop fs -ls /tmp
Shutting Down the Cluster
When you want to shut down the cluster, run the following command.
whirr destroy-cluster --config cm-ec2.properties
<< Installing Cloudera Manager and CDH on EC2 | Configuring a Custom Java Home Location >> | |