2. Add DataNodes or TaskTrackers - Hortonworks Data Platform

On each of the newly added slave nodes, add the HDP repository to yum:

wget -nv //docs.hortonworks.com/HDP/repos/centos6/hdp.repo -O
/etc/yum.repos.d/hdp.repo
yum clean all

On each of the newly added slave nodes, install HDFS and MapReduce.

On RHEL and CentOS:

yum install hadoop hadoop-libhdfs hadoop-native
yum install hadoop-pipes hadoop-sbin openssl

On SLES:

zypper install hadoop hadoop-libhdfs hadoop-native
zypper install hadoop-pipes hadoop-sbin openssl

On each of the newly added slave nodes, install Snappy compression/decompression library:

Check if Snappy is already installed:
```
rpm-qa | grep snappy
```

Install Snappy on the new nodes:

For RHEL/CentOS:
```
yum install snappy snappy-devel 
```

For SLES:

zypper install snappy snappy-devel

ln -sf /usr/lib64/libsnappy.so
/usr/lib/hadoop/lib/native/Linux-amd64-64/.

Optional - Install the LZO compression library.

On RHEL and CentOS:
```
yum install lzo-devel hadoop-lzo-native
```

On SLES:

zypper install lzo-devel hadoop-lzo-native

Copy the Hadoop configurations to the newly added slave nodes and set appropriate permissions.

Option I: Copy Hadoop config files from an existing slave node.
1. On an existing slave node, make a copy of the current configurations:
```
tar zcvf hadoop_conf.tgz /etc/hadoop/conf              
```
2. Copy this file to each of the new nodes:
```
rm -rf /etc/hadoop/conf
cd /
tar zxvf $location_of_copied_conf_tar_file/hadoop_conf.tgz
chmod -R 755 /etc/hadoop/conf
```

Option II: Manually add Hadoop configuration files.

Download core Hadoop configuration files from here and extract the files under configuration_files -> core_hadoop directory to a temporary location.

In the temporary directory, locate the following files and modify the properties based on your environment. Search for TODO in the files for the properties to replace.

Table 7.1. core-site.xml
Property	Example	Description
fs.default.name	`hdfs://{namenode.full.hostname}:8020`	Enter your NameNode hostname
fs.checkpoint.dir	`/grid/hadoop/hdfs/snn`	A comma separated list of paths. Use the list of directories from `$FS_CHECKPOINT_DIR.`.

Table 7.2. hdfs-site.xml
Property	Example	Description
dfs.name.dir	`/grid/hadoop/hdfs/nn,/grid1/hadoop/hdfs/nn`	Comma separated list of paths. Use the list of directories from `$DFS_NAME_DIR`
dfs.data.dir	`/grid/hadoop/hdfs/dn,grid1/hadoop/hdfs/dn`	Comma separated list of paths. Use the list of directories from `$DFS_DATA_DIR`
dfs.http.address	`{namenode.full.hostname}:50070`	Enter your NameNode hostname for http access
dfs.secondary.http.address	`{secondary.namenode.full.hostname}:50090`	Enter your SecondaryNameNode hostname
dfs.https.address	`{namenode.full.hostname}:50470`	Enter your NameNode hostname for https access.

Table 7.3. mapred-site.xml
Property	Example	Description
mapred.job.tracker	`{jobtracker.full.hostname}:50300`	Enter your JobTracker hostname
mapred.job.tracker.http.address	`{jobtracker.full.hostname}:50030`	Enter your JobTracker hostname
mapred.local.dir	`/grid/hadoop/mapred,/grid1/hadoop/mapred`	Comma separated list of paths. Use the list of directories from `$MAPREDUCE_LOCAL_DIR`
mapreduce.tasktracker.group	`hadoop`	Enter your group. Use the value of `$HADOOP_GROUP`
mapreduce.history.server.http.address	`{jobtracker.full.hostname}:51111`	Enter your JobTracker hostname

Table 7.4. taskcontroller.cfg
Property	Example	Description
mapred.local.dir	`/grid/hadoop/mapred,/grid1/hadoop/mapred`	Comma separated list of paths. Use the list of directories from `$MAPREDUCE_LOCAL_DIR`

Create the config directory on all hosts in your cluster, copy in all the configuration files, and set permissions.

rm -r $HADOOP_CONF_DIR
mkdir -p $HADOOP_CONF_DIR

 <copy the all the config files to $HADOOP_CONF_DIR>

chmod a+x $HADOOP_CONF_DIR/
chown -R $HDFS_USER:$HADOOP_GROUP $HADOOP_CONF_DIR/../
chmod -R 755 $HADOOP_CONF_DIR/../

where:

$HADOOP_CONF_DIR is the directory for storing the Hadoop configuration files. For example, /etc/hadoop/conf.
$HDFS_USER is the user owning the HDFS services. For example, hdfs.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

On each of the newly added slave nodes, start HDFS:

su -hdfs
/usr/lib/hadoop/bin/hadoop-daemon.sh --config
$HADOOP_CONF_DIR start datanode

On each of the newly added slave nodes, start MapReduce:

su -mapred
/usr/lib/hadoop/bin/hadoop-daemon.sh --config
$HADOOP_CONF_DIR start tasktracker

Add new slave nodes.

To add a new NameNode slave (DataNode):

On the NameNode host machine, edit the /etc/hadoop/conf/dfs.include file and add the list of slave nodes' hostnames (separated by newline character).

	Important
	Ensure that you create a new `dfs.include` file, if the NameNode host machine does not have an existing copy of this file.

On the NameNode host machine, execute the following command:
```
su – hdfs –c “hadoop dfsadmin –refreshNodes”
```

To add a new JobTracker slave (TaskTracker):

One the JobTracker host machine, edit the /etc/hadoop/conf/mapred.include file and add the list of slave nodes' hostnames (separated by newline character).

	Important
	Ensure that you create a new `mapred.include` file, if the JobTracker host machine does not have an existing copy of this file.

On the JobTracker host machine, execute the following command:
```
su – mapred –c “hadoop mradmin –refreshNodes”
```

Optional - Enable monitoring on the newly added slave nodes using the instructions provided here.

Optional - Enable cluster alerting on the newly added slave nodes using the instructions provided here.