Create Directories

Create directories and configure ownership + permissions on the appropriate hosts as described below.

Before you begin:

If any of these directories already exist, we recommend deleting and recreating them.
Hortonworks provides a set of configuration files that represent a working ZooKeeper configuration. (See Download Companion Files. You can use these files as a reference point. However, you will need to modify them to match your own cluster environment.

Use the following instructions to create appropriate directories:

Create the NameNode Directories

On the node that hosts the NameNode service, execute the following commands:

mkdir -p $DFS_NAME_DIR;
chown -R $HDFS_USER:$HADOOP_GROUP $DFS_NAME_DIR;
chmod -R 755 $DFS_NAME_DIR;

Where:

$DFS_NAME_DIR is the space separated list of directories where NameNode stores the file system image. For example, /grid/hadoop/hdfs/nn /grid1/hadoop/hdfs/nn.
$HDFS_USER is the user owning the HDFS services. For example, hdfs.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

Create the SecondaryNameNode Directories

On all the nodes that can potentially run the SecondaryNameNode service, execute the following commands:

mkdir -p $FS_CHECKPOINT_DIR;
chown -R $HDFS_USER:$HADOOP_GROUP $FS_CHECKPOINT_DIR; 
chmod -R 755 $FS_CHECKPOINT_DIR;

where:

$FS_CHECKPOINT_DIR is the space-separated list of directories where SecondaryNameNode should store the checkpoint image. For example, /grid/hadoop/hdfs/snn /grid1/hadoop/hdfs/snn /grid2/hadoop/hdfs/snn.
$HDFS_USER is the user owning the HDFS services. For example, hdfs.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

Create DataNode and YARN NodeManager Local Directories

At each DataNode, execute the following commands:

mkdir -p $DFS_DATA_DIR;
chown -R $HDFS_USER:$HADOOP_GROUP $DFS_DATA_DIR;
chmod -R 750 $DFS_DATA_DIR;

where:

$DFS_DATA_DIR is the space-separated list of directories where DataNodes should store the blocks. For example, /grid/hadoop/hdfs/dn /grid1/hadoop/hdfs/dn / grid2/hadoop/hdfs/dn.
$HDFS_USER is the user owning the HDFS services. For example, hdfs.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

At each ResourceManager and all DataNodes, execute the following commands:

mkdir -p $YARN_LOCAL_DIR;
chown -R $YARN_USER:$HADOOP_GROUP $YARN_LOCAL_DIR;
chmod -R 755 $YARN_LOCAL_DIR;

where:

$YARN_LOCAL_DIR is the space separated list of directories where YARN should store container log data. For example, /grid/hadoop/yarn/local /grid1/hadoop/ yarn/local /grid2/hadoop/yarn/local.
$YARN_USER is the user owning the YARN services. For example, yarn.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

At each ResourceManager and all DataNodes, execute the following commands:

mkdir -p $YARN_LOCAL_LOG_DIR;
chown -R $YARN_USER:$HADOOP_GROUP $YARN_LOCAL_LOG_DIR; 
chmod -R 755 $YARN_LOCAL_LOG_DIR;

where:

$YARN_LOCAL_LOG_DIR is the space-separated list of directories where YARN should store temporary data. For example, /grid/hadoop/yarn/logs /grid1/hadoop/yarn/logs /grid2/hadoop/yarn/logs.
$YARN_USER is the user owning the YARN services. For example, yarn.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.

Create the Log and PID Directories

Each ZooKeeper service requires a log and PID directory. In this section, you will create directories for each service. If you choose to use the companion file scripts, these environment variables will already be defined and you can copy and paste the examples into your terminal window.