1. Create a Rack Topology Script

Topology scripts are used by Hadoop to determine the rack location of nodes. This information is used by Hadoop to replicate block data to redundant racks.

  1. Create a topology script and data file. The topology script must be executable.

    Sample Topology Script Named rack-topology.sh

    #!/bin/bash
    
    # Adjust/Add the property "net.topology.script.file.name"
    # to core-site.xml with the "absolute" path the this
    # file. ENSURE the file is "executable".
    
    # Supply appropriate rack prefix
    RACK_PREFIX=default
    
    # To test, supply a hostname as script input:
    if [ $# -gt 0 ]; then
    
    CTL_FILE=${CTL_FILE:-"rack_topology.data"}
    
    HADOOP_CONF=${HADOOP_CONF:-"/etc/hadoop/conf"} 
    
    if [ ! -f ${HADOOP_CONF}/${CTL_FILE} ]; then
     echo -n "/$RACK_PREFIX/rack "
     exit 0
    fi
    
    while [ $# -gt 0 ] ; do
     nodeArg=$1
     exec< ${HADOOP_CONF}/${CTL_FILE}
     result=""
     while read line ; do
     ar=( $line )
     if [ "${ar[0]}" = "$nodeArg" ] ; then
     result="${ar[1]}"
     fi
     done
     shift
     if [ -z "$result" ] ; then
     echo -n "/$RACK_PREFIX/rack "
     else
     echo -n "/$RACK_PREFIX/rack_$result "
     fi
    done
    
    else
     echo -n "/$RACK_PREFIX/rack "
    fi

    Sample Topology Data File Named rack_topology.data

    # This file should be:
    # - Placed in the /etc/hadoop/conf directory
    # - On the Namenode (and backups IE: HA, Failover, etc)
    # - On the Job Tracker OR Resource Manager (and any Failover JT's/RM's) 
    # This file should be placed in the /etc/hadoop/conf directory.
    
    # Add Hostnames to this file. Format <host ip> <rack_location>
    192.168.2.10 01
    192.168.2.11 02
    192.168.2.12 03
  2. Copy both of these files to the /etc/hadoop/conf directory on all cluster nodes.

  3. Run the rack-topology.sh script to ensure that it returns the correct rack information for each host.