4. Create Directories

Create directories and configure ownership + permissions on the appropriate hosts as described below. If any of these directories already exist, we recommend deleting and recreating them.

Use the following instructions to create appropriate directories:

  1. We strongly suggest that you edit and source the files included in scripts.zip file (downloaded in Download Companion Files).

    Alternatively, you can also copy the contents to your ~/.bash_profile) to set up these environment variables in your environment.

  2. Create the NameNode Directories

  3. Create the Secondary NameNode Directories

  4. Create the DataNode and MapReduce Local Directories

  5. Create the Log and PID Directories

 4.1. Create the NameNode Directories

On the node that hosts the NameNode service, execute the following commands:

mkdir -p $DFS_NAME_DIR
chown -R $HDFS_USER:$HADOOP_GROUP $DFS_NAME_DIR
chmod -R 755 $DFS_NAME_DIR

where:

  • $DFS_NAME_DIR is the space separated list of directories where NameNode stores the file system image. For example, /grid/hadoop/hdfs/nn /grid1/hadoop/hdfs/nn.

  • $HDFS_USER is the user owning the HDFS services. For example, hdfs.

  • $HADOOP_GROUP is a common group shared by services. For example, hadoop.

 4.2. Create the SecondaryNameNode Directories

On all the nodes that can potentially host the SecondaryNameNode service, execute the following commands:

mkdir -p $FS_CHECKPOINT_DIR
chown -R $HDFS_USER:$HADOOP_GROUP $FS_CHECKPOINT_DIR
chmod -R 755 $FS_CHECKPOINT_DIR

where:

  • $FS_CHECKPOINT_DIR is the space separated list of directories where SecondaryNameNode should store the checkpoint image. For example, /grid/hadoop/hdfs/snn /grid1/hadoop/hdfs/snn.

  • $HDFS_USER is the user owning the HDFS services. For example, hdfs.

  • $HADOOP_GROUP is a common group shared by services. For example, hadoop.

 4.3. Create the DataNode and MapReduce Local Directories

On all DataNodes, execute the following commands:

mkdir -p $DFS_DATA_DIR
chown -R $HDFS_USER:$HADOOP_GROUP $DFS_DATA_DIRM
chmod -R 750 $DFS_DATA_DIR

On the JobTracker and all Datanodes, execute the following commands:

mkdir -p $MAPREDUCE_LOCAL_DIR
chown -R $MAPRED_USER:$HADOOP_GROUP $MAPREDUCE_LOCAL_DIR
chmod -R 755 $MAPREDUCE_LOCAL_DIR

where:

  • $DFS_DATA_DIR is the space separated list of directories where DataNodes should store the blocks. For example, /grid/hadoop/hdfs/dn /grid1/hadoop/hdfs/dn.

  • $HDFS_USER is the user owning the HDFS services. For example, hdfs.

  • $MAPREDUCE_LOCAL_DIR is the space separated list of directories where MapReduce should store temporary data. For example, /grid/hadoop/mapred /grid1/hadoop/mapred /grid2/hadoop/mapred.

  • $MAPRED_USER is the user owning the MapReduce services. For example, mapred.

  • $HADOOP_GROUP is a common group shared by services. For example, hadoop.

 4.4. Create the Log and PID Directories

On all nodes, execute the following commands:

mkdir -p $HDFS_LOG_DIR
chown -R $HDFS_USER:$HADOOP_GROUP $HDFS_LOG_DIR
chmod -R 755 $HDFS_LOG_DIR
mkdir -p $MAPRED_LOG_DIR
chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_LOG_DIR
chmod -R 755 $MAPRED_LOG_DIR
mkdir -p $HDFS_PID_DIR
chown -R $HDFS_USER:$HADOOP_GROUP $HDFS_PID_DIR
chmod -R 755 $HDFS_PID_DIR
mkdir -p $MAPRED_PID_DIR
chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_PID_DIR
chmod -R 755 $MAPRED_PID_DIR

where:

  • $HDFS_LOG_DIR is the directory for storing the HDFS logs.

    This directory name is a combination of a directory and the $HDFS_USER. For example, /var/log/hadoop/hdfs where hdfs is the $HDFS_USER.

  • $HDFS_PID_DIR is the directory for storing the HDFS process ID.

    This directory name is a combination of a directory and the $HDFS_USER. For example, /var/run/hadoop/hdfs where hdfs is the $HDFS_USER.

  • $MAPRED_LOG_DIR is the directory for storing the MapReduce logs.

    This directory name is a combination of a directory and the $MAPRED_USER. For example, /var/log/hadoop/mapred where mapred is the $MAPRED_USER.

  • $MAPRED_PID_DIR is the directory for storing the MapReduce process ID.

    This directory name is a combination of a directory and the $MAPRED_USER. For example, /var/run/hadoop/mapred where mapred is the $MAPRED_USER.


loading table of contents...