Create directories and configure ownership + permissions on the appropriate hosts as described below. If any of these directories already exist, we recommend deleting and recreating them.
Use the following instructions to create appropriate directories:
We strongly suggest that you edit and source the files included in
scripts.zip
file (downloaded in Download Companion Files).Alternatively, you can also copy the contents to your
~/.bash_profile
) to set up these environment variables in your environment.
On the node that hosts the NameNode service, execute the following commands:
mkdir -p $DFS_NAME_DIR chown -R $HDFS_USER:$HADOOP_GROUP $DFS_NAME_DIR chmod -R 755 $DFS_NAME_DIR
where:
$DFS_NAME_DIR
is the space separated list of directories where NameNode stores the file system image. For example,/grid/hadoop/hdfs/nn /grid1/hadoop/hdfs/nn
.$HDFS_USER
is the user owning the HDFS services. For example,hdfs
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
On all the nodes that can potentially host the SecondaryNameNode service, execute the following commands:
mkdir -p $FS_CHECKPOINT_DIR chown -R $HDFS_USER:$HADOOP_GROUP $FS_CHECKPOINT_DIR chmod -R 755 $FS_CHECKPOINT_DIR
where:
$FS_CHECKPOINT_DIR
is the space separated list of directories where SecondaryNameNode should store the checkpoint image. For example,/grid/hadoop/hdfs/snn /grid1/hadoop/hdfs/snn
.$HDFS_USER
is the user owning the HDFS services. For example,hdfs
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
On all DataNodes, execute the following commands:
mkdir -p $DFS_DATA_DIR chown -R $HDFS_USER:$HADOOP_GROUP $DFS_DATA_DIR chmod -R 750 $DFS_DATA_DIR
On the JobTracker and all Datanodes, execute the following commands:
mkdir -p $MAPREDUCE_LOCAL_DIR chown -R $MAPRED_USER:$HADOOP_GROUP $MAPREDUCE_LOCAL_DIR chmod -R 755 $MAPREDUCE_LOCAL_DIR
where:
$DFS_DATA_DIR
is the space separated list of directories where DataNodes should store the blocks. For example,/grid/hadoop/hdfs/dn /grid1/hadoop/hdfs/dn
.$HDFS_USER
is the user owning the HDFS services. For example,hdfs
.$MAPREDUCE_LOCAL_DIR
is the space separated list of directories where MapReduce should store temporary data. For example,/grid/hadoop/mapred /grid1/hadoop/mapred /grid2/hadoop/mapred
.$MAPRED_USER
is the user owning the MapReduce services. For example,mapred
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
On all nodes, execute the following commands:
mkdir -p $HDFS_LOG_DIR chown -R $HDFS_USER:$HADOOP_GROUP $HDFS_LOG_DIR chmod -R 755 $HDFS_LOG_DIR
mkdir -p $MAPRED_LOG_DIR chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_LOG_DIR chmod -R 755 $MAPRED_LOG_DIR
mkdir -p $HDFS_PID_DIR chown -R $HDFS_USER:$HADOOP_GROUP $HDFS_PID_DIR chmod -R 755 $HDFS_PID_DIR
mkdir -p $MAPRED_PID_DIR chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_PID_DIR chmod -R 755 $MAPRED_PID_DIR
where:
$HDFS_LOG_DIR
is the directory for storing the HDFS logs.This directory name is a combination of a directory and the
$HDFS_USER
. For example,/var/log/hadoop/hdfs
wherehdfs
is the$HDFS_USER
.$HDFS_PID_DIR
is the directory for storing the HDFS process ID.This directory name is a combination of a directory and the
$HDFS_USER
. For example,/var/run/hadoop/hdfs
wherehdfs
is the$HDFS_USER
.$MAPRED_LOG_DIR
is the directory for storing the MapReduce logs.This directory name is a combination of a directory and the
$MAPRED_USER
. For example,/var/log/hadoop/mapred
wheremapred
is the$MAPRED_USER
.$MAPRED_PID_DIR
is the directory for storing the MapReduce process ID.This directory name is a combination of a directory and the
$MAPRED_USER
. For example,/var/run/hadoop/mapred
wheremapred
is the$MAPRED_USER
.