Use the following instructions to manually add a DataNode or a TaskTracker hosts:
On each of the newly added slave nodes, add the HDP repository to yum:
wget -nv //public-repo-1.hortonworks.com/HDP/repos/centos6/hdp.repo -O /etc/yum.repos.d/hdp.repo yum clean all
On each of the newly added slave nodes, install HDFS and MapReduce.
On RHEL and CentOS:
yum install hadoop hadoop-libhdfs hadoop-native yum install hadoop-pipes hadoop-sbin openssl
On SLES:
zypper install hadoop hadoop-libhdfs hadoop-native zypper install hadoop-pipes hadoop-sbin openssl
On each of the newly added slave nodes, install Snappy compression/decompression library:
Check if Snappy is already installed:
rpm-qa | grep snappy
Install Snappy on the new nodes:
For RHEL/CentOS:
yum install snappy snappy-devel
For SLES:
zypper install snappy snappy-devel
ln -sf /usr/lib64/libsnappy.so /usr/lib/hadoop/lib/native/Linux-amd64-64/.
Optional - Install the LZO compression library.
On RHEL and CentOS:
yum install lzo-devel hadoop-lzo-native
On SLES:
zypper install lzo-devel hadoop-lzo-native
Copy the Hadoop configurations to the newly added slave nodes and set appropriate permissions.
Option I: Copy Hadoop config files from an existing slave node.
On an existing slave node, make a copy of the current configurations:
tar zcvf hadoop_conf.tgz /etc/hadoop/conf
Copy this file to each of the new nodes:
rm -rf /etc/hadoop/conf cd / tar zxvf $location_of_copied_conf_tar_file/hadoop_conf.tgz chmod -R 755 /etc/hadoop/conf
Option II: Manually add Hadoop configuration files.
Download core Hadoop configuration files from here and extract the files under
configuration_files -> core_hadoop
directory to a temporary location.In the temporary directory, locate the following files and modify the properties based on your environment. Search for TODO in the files for the properties to replace.
Table 6.1. core-site.xml Property Example Description fs.default.name hdfs://{namenode.full.hostname}:8020
Enter your NameNode hostname fs.checkpoint.dir /grid/hadoop/hdfs/snn
A comma separated list of paths. Use the list of directories from $FS_CHECKPOINT_DIR.
.Table 6.2. hdfs-site.xml Property Example Description dfs.name.dir /grid/hadoop/hdfs/nn,/grid1/hadoop/hdfs/nn
Comma separated list of paths. Use the list of directories from $DFS_NAME_DIR
dfs.data.dir /grid/hadoop/hdfs/dn,grid1/hadoop/hdfs/dn
Comma separated list of paths. Use the list of directories from $DFS_DATA_DIR
dfs.http.address {namenode.full.hostname}:50070
Enter your NameNode hostname for http access dfs.secondary.http.address {secondary.namenode.full.hostname}:50090
Enter your SecondaryNameNode hostname dfs.https.address {namenode.full.hostname}:50470
Enter your NameNode hostname for https access. Table 6.3. mapred-site.xml Property Example Description mapred.job.tracker {jobtracker.full.hostname}:50300
Enter your JobTracker hostname mapred.job.tracker.http.address {jobtracker.full.hostname}:50030
Enter your JobTracker hostname mapred.local.dir /grid/hadoop/mapred,/grid1/hadoop/mapred
Comma separated list of paths. Use the list of directories from $MAPREDUCE_LOCAL_DIR
mapreduce.tasktracker.group hadoop
Enter your group. Use the value of $HADOOP_GROUP
mapreduce.history.server.http.address {jobtracker.full.hostname}:51111
Enter your JobTracker hostname Table 6.4. taskcontroller.cfg Property Example Description mapred.local.dir /grid/hadoop/mapred,/grid1/hadoop/mapred
Comma separated list of paths. Use the list of directories from $MAPREDUCE_LOCAL_DIR
Create the config directory on all hosts in your cluster, copy in all the configuration files, and set permissions.
rm -r $HADOOP_CONF_DIR mkdir -p $HADOOP_CONF_DIR
<copy the all the config files to $HADOOP_CONF_DIR>
chmod a+x $HADOOP_CONF_DIR/ chown -R $HDFS_USER:$HADOOP_GROUP $HADOOP_CONF_DIR/../ chmod -R 755 $HADOOP_CONF_DIR/../
where:
$HADOOP_CONF_DIR
is the directory for storing the Hadoop configuration files. For example,/etc/hadoop/conf
.$HDFS_USER
is the user owning the HDFS services. For example,hdfs
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
On each of the newly added slave nodes, start HDFS:
su -hdfs /usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start datanode
On each of the newly added slave nodes, start MapReduce:
su -mapred /usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start tasktracker
Add new slave nodes.
To add a new NameNode slave (DataNode):
On the NameNode host machine, edit the
/etc/hadoop/conf/dfs.include
file and add the list of slave nodes' hostnames (separated by newline character).Important Ensure that you create a new
dfs.include
file, if the NameNode host machine does not have an existing copy of this file.On the NameNode host machine, execute the following command:
su – hdfs –c “hadoop dfsadmin –refreshNodes”
To add a new JobTracker slave (TaskTracker):
One the JobTracker host machine, edit the
/etc/hadoop/conf/mapred.include
file and add the list of slave nodes' hostnames (separated by newline character).Important Ensure that you create a new
mapred.include
file, if the JobTracker host machine does not have an existing copy of this file.On the JobTracker host machine, execute the following command:
su – mapred –c “hadoop mradmin –refreshNodes”
Optional - Enable monitoring on the newly added slave nodes using the instructions provided here.
Optional - Enable cluster alerting on the newly added slave nodes using the instructions provided here.