2. Add Slave Nodes

Use the following instructions to manually add a slave node:

  1. On each new slave node, configure the remote repository as described here.

  2. On each new slave node, install HDFS as described here.

  3. On each new slave node, install compression libraries as described here.

  4. On each new slave node, create the DataNode and YARN NodeManager local directories as described in section 4.3 on this page.

  5. Copy the Hadoop configurations to the new slave nodes and set appropriate permissions.

    • Option I: Copy Hadoop config files from an existing slave node.

      1. On an existing slave node, make a copy of the current configurations:

        tar zcvf hadoop_conf.tgz /etc/hadoop/conf              
      2. Copy this file to each of the new nodes:

        rm -rf /etc/hadoop/conf
        cd /
        tar zxvf $location_of_copied_conf_tar_file/hadoop_conf.tgz
        chmod -R 755 /etc/hadoop/conf
    • Option II: Manually set up the Hadoop configuration files as described here.

  6. On each of the new slave nodes, start the DataNode:

    su -l hdfs -c "/usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start datanode"
  7. On each of the new slave nodes, start the NodeManager:

    su - yarn -c "export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec && /usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/conf start nodemanager"
  8. Optional - If you use a HDFS or YARN/ResourceManager .include file in your cluster, add the new slave nodes to the .include file, then run the applicable refreshNodes command.

    • To add new DataNodes to the dfs.include file:

      1. On the NameNode host machine, edit the /etc/hadoop/conf/dfs.include file and add the list of the new slave node hostnames (separated by newline character).

        [Note]Note

        If no dfs.include file is specified, all DataNodes are considered to be included in the cluster (unless excluded in the dfs.exclude file). The dfs.hosts and dfs.hosts.exlude properties in hdfs-site.xml are used to specify the dfs.include and dfs.exclude files.

      2. On the NameNode host machine, execute the following command:

        su -l hdfs -c "hdfs dfsadmin -refreshNodes"
    • To add new NodeManagers to the yarn.include file:

      1. On the ResourceManager host machine, edit the /etc/hadoop/conf/yarn.include file and add the list of the slave node hostnames (separated by newline character).

        [Note]Note

        If no yarn.include file is specified, all NodeManagers are considered to be included in the cluster (unless excluded in the yarn.exclude file). The yarn.resourcemanager.nodes.include-path and yarn.resourcemanager.nodes.exclude-path properties in yarn-site.xml are used to specify the yarn.include and yarn.exclude files.

      2. On the ResourceManager host machine, execute the following command:

        su -l yarn -c "yarn rmadmin -refreshNodes"


loading table of contents...