Scaling Namespaces and Optimizing Data Storage
Also available as:
PDF
loading table of contents...

Create a rack topology script

HDFS uses topology scripts to determine the rack location of nodes and uses this information to replicate block data to redundant racks.

  1. Create an executable topology script and a topology data file.
    Consider the following examples:
    The following is an example topology script named rack-topology.sh.
    
    #!/bin/bash
    # Adjust/Add the property "net.topology.script.file.name"
    # to core-site.xml with the "absolute" path the this
    # file. ENSURE the file is "executable".
    
    # Supply appropriate rack prefix
    RACK_PREFIX=default
    
    # To test, supply a hostname as script input:
    if [ $# -gt 0 ]; then
    
    CTL_FILE=${CTL_FILE:-"rack_topology.data"}
    
    HADOOP_CONF=${HADOOP_CONF:-"/etc/hadoop/conf"} 
    
    if [ ! -f ${HADOOP_CONF}/${CTL_FILE} ]; then
     echo -n "/$RACK_PREFIX/rack "
     exit 0
    fi
    
    while [ $# -gt 0 ] ; do
     nodeArg=$1
     exec< ${HADOOP_CONF}/${CTL_FILE}
     result=""
     while read line ; do
     ar=( $line )
     if [ "${ar[0]}" = "$nodeArg" ] ; then
     result="${ar[1]}"
     fi
     done
     shift
     if [ -z "$result" ] ; then
     echo -n "/$RACK_PREFIX/rack "
     else
     echo -n "/$RACK_PREFIX/rack_$result "
     fi
    done
    
    else
     echo -n "/$RACK_PREFIX/rack "
    fi
    The following is an example topology data file named rack_topology.data.
    
    # This file should be:
    # - Placed in the /etc/hadoop/conf directory
    # - On the Namenode (and backups IE: HA, Failover, etc)
    # - On the Job Tracker OR Resource Manager (and any Failover JT's/RM's) 
    # This file should be placed in the /etc/hadoop/conf directory.
    
    # Add Hostnames to this file. Format <host ip> <rack_location>
    192.168.2.10 01
    192.168.2.11 02
    192.168.2.12 03
  2. Copy the topology script and the data file to the /etc/hadoop/conf directory on all cluster nodes.
  3. Run the topology script to ensure that it returns the correct rack information for each host.