4. Create Directories

Create directories and configure ownership + permissions on the appropriate hosts as described below.

If any of these directories already exist, we recommend deleting and recreating them.

Use the following instructions to create appropriate directories:

We strongly suggest that you edit and source the bash script files included with the companion files (downloaded in Download Companion Files).
Alternatively, you can also copy the contents to your ~/.bash_profile) to set up these environment variables in your environment.
Create the NameNode directories
Create the Secondary NameNode directories
Create the DataNode and YARN NodeManager local directories
Create the log and PID directories

On the node that hosts the NameNode service, execute the following commands:

mkdir -p $DFS_NAME_DIR;
chown -R $HDFS_USER:$HADOOP_GROUP $DFS_NAME_DIR;
chmod -R 755 $DFS_NAME_DIR;

where:

$DFS_NAME_DIR is the space separated list of directories where NameNode stores the file system image. For example, /grid/hadoop/hdfs/nn /grid1/hadoop/hdfs/nn.
$HDFS_USER is the user owning the HDFS services. For example, hdfs.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

On all the nodes that can potentially run the SecondaryNameNode service, execute the following commands:

mkdir -p $FS_CHECKPOINT_DIR;
chown -R $HDFS_USER:$HADOOP_GROUP $FS_CHECKPOINT_DIR;
chmod -R 755 $FS_CHECKPOINT_DIR;

where:

$FS_CHECKPOINT_DIR is the space separated list of directories where SecondaryNameNode should store the checkpoint image. For example, /grid/hadoop/hdfs/snn /grid1/hadoop/hdfs/snn /grid2/hadoop/hdfs/snn.
$HDFS_USER is the user owning the HDFS services. For example, hdfs.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

On all DataNodes, execute the following commands:

mkdir -p $DFS_DATA_DIR;
chown -R $HDFS_USER:$HADOOP_GROUP $DFS_DATA_DIR;
chmod -R 750 $DFS_DATA_DIR;

where:

$DFS_DATA_DIR is the space separated list of directories where DataNodes should store the blocks. For example, /grid/hadoop/hdfs/dn /grid1/hadoop/hdfs/dn /grid2/hadoop/hdfs/dn.
$HDFS_USER is the user owning the HDFS services. For example, hdfs.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

On the ResourceManager and all DataNodes, execute the following commands:

mkdir -p $YARN_LOCAL_DIR;
chown -R $YARN_USER:$HADOOP_GROUP $YARN_LOCAL_DIR;
chmod -R 755 $YARN_LOCAL_DIR;

where:

$YARN_LOCAL_DIR is the space separated list of directories where YARN should store temporary data. For example, /grid/hadoop/yarn/local /grid1/hadoop/yarn/local /grid2/hadoop/yarn/local.
$YARN_USER is the user owning the YARN services. For example, yarn.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

On the ResourceManager and all DataNodes, execute the following commands:

mkdir -p $YARN_LOCAL_LOG_DIR;
chown -R $YARN_USER:$HADOOP_GROUP $YARN_LOCAL_LOG_DIR;
chmod -R 755 $YARN_LOCAL_LOG_DIR;

where:

$YARN_LOCAL_LOG_DIR is the space separated list of directories where YARN should store temporary data. For example, /grid/hadoop/yarn/logs /grid1/hadoop/yarn/logs /grid2/hadoop/yarn/local.
$YARN_USER is the user owning the YARN services. For example, yarn.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

On all nodes, execute the following commands:

mkdir -p $HDFS_LOG_DIR;
chown -R $HDFS_USER:$HADOOP_GROUP $HDFS_LOG_DIR;
chmod -R 755 $HDFS_LOG_DIR;

where:

$HDFS_LOG_DIR is the directory for storing the HDFS logs.
This directory name is a combination of a directory and the $HDFS_USER.
For example, /var/log/hadoop/hdfs where hdfs is the $HDFS_USER.
$HDFS_USER is the user owning the HDFS services. For example, hdfs.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

mkdir -p $YARN_LOG_DIR;
chown -R $YARN_USER:$HADOOP_GROUP $YARN_LOG_DIR;
chmod -R 755 $YARN_LOG_DIR;

where:

$YARN_LOG_DIR is the directory for storing the YARN logs.
This directory name is a combination of a directory and the $YARN_USER.
For example, /var/log/hadoop/yarn where yarn is the $YARN_USER.
$YARN_USER is the user owning the YARN services. For example, yarn.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

mkdir -p $HDFS_PID_DIR;
chown -R $HDFS_USER:$HADOOP_GROUP $HDFS_PID_DIR;
chmod -R 755 $HDFS_PID_DIR

where:

$HDFS_PID_DIR is the directory for storing the HDFS process ID.
This directory name is a combination of a directory and the $HDFS_USER.
For example, /var/run/hadoop/hdfs where hdfs is the $HDFS_USER.
$HDFS_USER is the user owning the HDFS services. For example, hdfs.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

mkdir -p $YARN_PID_DIR;
chown -R $YARN_USER:$HADOOP_GROUP $YARN_PID_DIR;
chmod -R 755 $YARN_PID_DIR;

where:

$YARN_PID_DIR is the directory for storing the YARN process ID.
This directory name is a combination of a directory and the $YARN_USER.
For example, /var/run/hadoop/yarn where yarn is the $YARN_USER.
$YARN_USER is the user owning the YARN services. For example, yarn.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

mkdir -p $MAPRED_LOG_DIR;
chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_LOG_DIR;
chmod -R 755 $MAPRED_LOG_DIR;

where:

$MAPRED_LOG_DIR is the directory for storing the JobHistory Server logs.
This directory name is a combination of a directory and the $MAPREDs_USER.
For example, /var/log/hadoop/mapred where mapred is the $MAPRED_USER.
$MAPRED_USER is the user owning the MAPRED services. For example, mapred.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

mkdir -p $MAPRED_PID_DIR;
chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_PID_DIR;
chmod -R 755 $MAPRED_PID_DIR;

where:

$MAPRED_PID_DIR is the directory for storing the JobHistory Server process ID.
This directory name is a combination of a directory and the $MAPREDs_USER.
For example, /var/run/hadoop/mapred where mapred is the $MAPRED_USER.
$MAPRED_USER is the user owning the MAPRED services. For example, mapred.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.