Chapter 4. Validating the Core Hadoop Installation

This section describes starting Core Hadoop and doing simple smoke tests. Use the following instructions to validate core Hadoop installation:

Format and start HDFS.
1. Execute these commands on the NameNode:
```
su $HDFS_USER
/usr/lib/hadoop/bin/hadoop namenode -format
/usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode
```
2. Execute these commands on the Secondary NameNode :
```
su $HDFS_USER
/usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start secondarynamenode
```
3. Execute these commands on all DataNodes:
```
su $HDFS_USER
/usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start datanode
```
where:
- $HDFS_USER is the user owning the HDFS services. For example, hdfs.
- $HADOOP_CONF_DIR is the directory for storing the Hadoop configuration files. For example, /etc/hadoop/conf.

Smoke Test HDFS.

See if you can reach the NameNode server with your browser:
```
http://$namenode.full.hostname:50070
```

Try copying a file into HDFS and listing that file:

su $HDFS_USER
/usr/lib/hadoop/bin/hadoop dfs -copyFromLocal /etc/passwd passwd-test
/usr/lib/hadoop/bin/hadoop dfs -ls

Test browsing HDFS:

http://$datanode.full.hostname:50075/browseDirectory.jsp?dir=/

Start MapReduce.
1. Execute these commands from the JobTracker server:
```
su $HDFS_USER
/usr/lib/hadoop/bin/hadoop fs -mkdir /mapred
/usr/lib/hadoop/bin/hadoop fs -chown -R mapred /mapred
```
```
su $MAPRED_USER
/usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
```
2. Execute these commands from the JobHistory server:
```
su $MAPRED_USER
/usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start historyserver
```
3. Execute these commands from all TaskTracker nodes:
```
su $MAPRED_USER
/usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start tasktracker
```
where:
- $HDFS_USER is the user owning the HDFS services. For example, hdfs.
- $MAPRED_USER is the user owning the MapReduce services. For example, mapred.
- $HADOOP_CONF_DIR is the directory for storing the Hadoop configuration files. For example, /etc/hadoop/conf.

Smoke Test MapReduce.

Try browsing to the JobTracker:
```
http://$jobtracker.full.hostname:50030/
```

Smoke test using Teragen (to generate 10GB of data) and then using Terasort to sort the data.

sus $HDFS_USER
/usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar teragen 100000000 /test/10gsort/input
/usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar terasort /test/10gsort/input /test/10gsort/output

Legal notices