Create directories and configure ownership + permissions on the appropriate hosts as described below.
If any of these directories already exist, we recommend deleting and recreating them.
Use the following instructions to create appropriate directories:
We strongly suggest that you edit and source the files included in
scripts.zip
file (downloaded in Download Companion Files).Alternatively, you can also copy the contents to your
~/.bash_profile
) to set up these environment variables in your environment.
On the node that hosts the NameNode service, execute the following commands:
mkdir -p $DFS_NAME_DIR; chown -R $HDFS_USER:$HADOOP_GROUP $DFS_NAME_DIR; chmod -R 755 $DFS_NAME_DIR;
where:
$DFS_NAME_DIR
is the space separated list of directories where NameNode stores the file system image. For example,/grid/hadoop/hdfs/nn /grid1/hadoop/hdfs/nn
.$HDFS_USER
is the user owning the HDFS services. For example,hdfs
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
On all the nodes that can potentially run the SecondaryNameNode service, execute the following commands:
mkdir -p $FS_CHECKPOINT_DIR; chown -R $HDFS_USER:$HADOOP_GROUP $FS_CHECKPOINT_DIR; chmod -R 755 $FS_CHECKPOINT_DIR;
where:
$FS_CHECKPOINT_DIR
is the space separated list of directories where SecondaryNameNode should store the checkpoint image. For example,/grid/hadoop/hdfs/snn /grid1/hadoop/hdfs/snn /grid2/hadoop/hdfs/snn
.$HDFS_USER
is the user owning the HDFS services. For example,hdfs
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
On all DataNodes, execute the following commands:
mkdir -p $DFS_DATA_DIR; chown -R $HDFS_USER:$HADOOP_GROUP $DFS_DATA_DIR; chmod -R 750 $DFS_DATA_DIR;
where:
$DFS_DATA_DIR
is the space separated list of directories where DataNodes should store the blocks. For example,/grid/hadoop/hdfs/dn /grid1/hadoop/hdfs/dn /grid2/hadoop/hdfs/dn
.$HDFS_USER
is the user owning the HDFS services. For example,hdfs
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
On the ResourceManager and all DataNodes, execute the following commands:
mkdir -p $YARN_LOCAL_DIR; chown -R $YARN_USER:$HADOOP_GROUP $YARN_LOCAL_DIR; chmod -R 755 $YARN_LOCAL_DIR;
where:
$YARN_LOCAL_DIR
is the space separated list of directories where YARN should store temporary data. For example,/grid/hadoop/yarn/local /grid1/hadoop/yarn/local /grid2/hadoop/yarn/local
.$YARN_USER
is the user owning the YARN services. For example,yarn
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
On the ResourceManager and all DataNodes, execute the following commands:
mkdir -p $YARN_LOCAL_LOG_DIR; chown -R $YARN_USER:$HADOOP_GROUP $YARN_LOCAL_LOG_DIR; chmod -R 755 $YARN_LOCAL_LOG_DIR;
where:
$YARN_LOCAL_LOG_DIR
is the space separated list of directories where YARN should store temporary data. For example,/grid/hadoop/yarn/logs /grid1/hadoop/yarn/logs /grid2/hadoop/yarn/local
.$YARN_USER
is the user owning the YARN services. For example,yarn
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
On all nodes, execute the following commands:
mkdir -p $HDFS_LOG_DIR; chown -R $HDFS_USER:$HADOOP_GROUP $HDFS_LOG_DIR; chmod -R 755 $HDFS_LOG_DIR;
where:
$HDFS_LOG_DIR
is the directory for storing the HDFS logs.This directory name is a combination of a directory and the
$HDFS_USER
.For example,
/var/log/hadoop/hdfs
wherehdfs
is the$HDFS_USER
.$HDFS_USER
is the user owning the HDFS services. For example,hdfs
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
mkdir -p $YARN_LOG_DIR; chown -R $YARN_USER:$HADOOP_GROUP $YARN_LOG_DIR; chmod -R 755 $YARN_LOG_DIR;
where:
$YARN_LOG_DIR
is the directory for storing the YARN logs.This directory name is a combination of a directory and the
$YARN_USER
.For example,
/var/log/hadoop/yarn
whereyarn
is the$YARN_USER
.$YARN_USER
is the user owning the YARN services. For example,yarn
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
mkdir -p $HDFS_PID_DIR; chown -R $HDFS_USER:$HADOOP_GROUP $HDFS_PID_DIR; chmod -R 755 $HDFS_PID_DIR
where:
$HDFS_PID_DIR
is the directory for storing the HDFS process ID.This directory name is a combination of a directory and the
$HDFS_USER
.For example,
/var/run/hadoop/hdfs
wherehdfs
is the$HDFS_USER
.$HDFS_USER
is the user owning the HDFS services. For example,hdfs
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
mkdir -p $YARN_PID_DIR; chown -R $YARN_USER:$HADOOP_GROUP $YARN_PID_DIR; chmod -R 755 $YARN_PID_DIR;
where:
$YARN_PID_DIR
is the directory for storing the YARN process ID.This directory name is a combination of a directory and the
$YARN_USER
.For example,
/var/run/hadoop/yarn
whereyarn
is the$YARN_USER
.$YARN_USER
is the user owning the YARN services. For example,yarn
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
mkdir -p $MAPRED_LOG_DIR; chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_LOG_DIR; chmod -R 755 $MAPRED_LOG_DIR;
where:
$MAPRED_LOG_DIR
is the directory for storing the JobHistory Server logs.This directory name is a combination of a directory and the
$MAPREDs_USER
.For example,
/var/log/hadoop/mapred
wheremapred
is the$MAPRED_USER
.$MAPRED_USER
is the user owning the MAPRED services. For example,mapred
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.
mkdir -p $MAPRED_PID_DIR; chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_PID_DIR; chmod -R 755 $MAPRED_PID_DIR;
where:
$MAPRED_PID_DIR
is the directory for storing the JobHistory Server pid.This directory name is a combination of a directory and the
$MAPREDs_USER
.For example,
/var/run/hadoop/mapred
wheremapred
is the$MAPRED_USER
.$MAPRED_USER
is the user owning the MAPRED services. For example,mapred
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.