2. Add Slave Nodes

Use the following instructions to manually add a slave node:

  • On each new slave node, configure the remote repository as described in "Installing ZooKeeper", in Installing HDP Manually.

  • On each new slave node, install HDFS.

  • On each new slave node, install compression libraries.

  • On each new slave node, create the DataNode and YARN NodeManager local directories.

  • Copy the Hadoop configurations to the new slave nodes and set appropriate permissions.

    • Option I: Copy Hadoop config files from an existing slave node.

      • On an existing slave node, make a copy of the current configurations:

        tar zcvf hadoop_conf.tgz /etc/hadoop/conf 
      • Copy this file to each of the new nodes:

        rm -rf /etc/hadoop/conf
        cd /
        tar zxvf $location_of_copied_conf_tar_file/hadoop_conf.tgz
        chmod -R 755 /etc/hadoop/confa
  • On each of the new slave nodes, start the NodeManager:

    su -l yarn -c "/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh start nodemanager"
  • Optional - If you use a HDFS or YARN/ResourceManager .include file in your cluster, add the new slave nodes to the .include file, then run the applicable refreshNodes command.

    • To add new DataNodes to the dfs.include file:

      • On the NameNode host machine, edit the /etc/hadoop/conf/dfs.include file and add the list of the new slave node host names (separated by newline character).

        [Note]Note

        If no dfs.include file is specified, all DataNodes are considered to be included in the cluster (unless excluded in the dfs.exclude file). The dfs.hosts and dfs.hosts.exlude properties in hdfs-site.xml are used to specify the dfs.include and dfs.exclude files.

      • On the NameNode host machine, execute the following command:

        su -l hdfs -c "hdfs dfsadmin -refreshNodes"
    • To add new NodeManagers to the yarn.include file:

      • On the ResourceManager host machine, edit the /etc/hadoop/conf/yarn.include file and add the list of the slave node host names (separated by newline character).

        [Note]Note

        If no yarn.include file is specified, all NodeManagers are considered to be included in the cluster (unless excluded in the yarn.exclude file). The yarn.resourcemanager.nodes.include-path and yarn.resourcemanager.nodes.exclude-path properties in yarn-site.xml are used to specify the yarn.include and yarn.exclude files.

      • On the ResourceManager host machine, execute the following command:

        su -l yarn -c "yarn rmadmin -refreshNodes"