Create Directories
Create directories and configure ownership + permissions on the appropriate hosts as described below.
Before you begin:
If any of these directories already exist, we recommend deleting and recreating them.
Hortonworks provides a set of configuration files that represent a working ZooKeeper configuration. (See Download Companion Files. You can use these files as a reference point, however, you need to modify them to match your own cluster environment.
Use the following instructions to create appropriate directories:
Create the NameNode Directories
On the node that hosts the NameNode service, execute the following commands:
mkdir -p $DFS_NAME_DIR; chown -R $HDFS_USER:$HADOOP_GROUP $DFS_NAME_DIR; chmod -R 755 $DFS_NAME_DIR;
Where:
$DFS_NAME_DIR is the space separated list of directories where NameNode stores the file system image. For example,
/grid/hadoop/hdfs/nn /grid1/hadoop/hdfs/nn
.$HDFS_USER is the user owning the HDFS services. For example, hdfs.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.
Create the SecondaryNameNode Directories
On all the nodes that can potentially run the SecondaryNameNode service, execute the following commands:
mkdir -p $FS_CHECKPOINT_DIR; chown -R $HDFS_USER:$HADOOP_GROUP $FS_CHECKPOINT_DIR; chmod -R 755 $FS_CHECKPOINT_DIR;
where:
$FS_CHECKPOINT_DIR is the space-separated list of directories where SecondaryNameNode should store the checkpoint image. For example,
/grid/hadoop/hdfs/snn /grid1/hadoop/hdfs/snn /grid2/hadoop/hdfs/snn
.$HDFS_USER is the user owning the HDFS services. For example, hdfs.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.
Create DataNode and YARN NodeManager Local Directories
At each DataNode, execute the following commands:
mkdir -p $DFS_DATA_DIR; chown -R $HDFS_USER:$HADOOP_GROUP $DFS_DATA_DIR; chmod -R 750 $DFS_DATA_DIR;
where:
$DFS_DATA_DIR is the space-separated list of directories where DataNodes should store the blocks. For example,
/grid/hadoop/hdfs/dn /grid1/hadoop/hdfs/dn / grid2/hadoop/hdfs/dn
.$HDFS_USER is the user owning the HDFS services. For example, hdfs.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.
At each ResourceManager and all DataNodes, execute the following commands:
mkdir -p $YARN_LOCAL_DIR; chown -R $YARN_USER:$HADOOP_GROUP $YARN_LOCAL_DIR; chmod -R 755 $YARN_LOCAL_DIR;
where:
$YARN_LOCAL_DIR is the space separated list of directories where YARN should store container log data. For example,
/grid/hadoop/yarn/local /grid1/hadoop/ yarn/local /grid2/hadoop/yarn/local
.$YARN_USER is the user owning the YARN services. For example, yarn.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.
At each ResourceManager and all DataNodes, execute the following commands:
mkdir -p $YARN_LOCAL_LOG_DIR; chown -R $YARN_USER:$HADOOP_GROUP $YARN_LOCAL_LOG_DIR; chmod -R 755 $YARN_LOCAL_LOG_DIR;
where:
$YARN_LOCAL_LOG_DIR is the space-separated list of directories where YARN should store temporary data. For example,
/grid/hadoop/yarn/logs /grid1/hadoop/yarn/logs /grid2/hadoop/yarn/logs
.$YARN_USER is the user owning the YARN services. For example, yarn.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.
Create the Log and PID Directories
Each ZooKeeper service requires a log and PID directory. In this section, you create directories for each service. If you choose to use the companion file scripts, these environment variables are already defined and you can copy and paste the examples into your terminal window.
HDFS Logs
At all nodes, execute the following commands:
mkdir -p $HDFS_LOG_DIR; chown -R $HDFS_USER:$HADOOP_GROUP $HDFS_LOG_DIR; chmod -R 755 $HDFS_LOG_DIR;
where:
$HDFS_LOG_DIR is the directory for storing the HDFS logs.
This directory name is a combination of a directory and the $HDFS_USER. For example,
/var/log/hadoop/hdfs
, where hdfs is the $HDFS_USER.$HDFS_USER is the user owning the HDFS services. For example, hdfs.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.
Yarn Logs
At all nodes, execute the following commands:
mkdir -p $YARN_LOG_DIR; chown -R $YARN_USER:$HADOOP_GROUP $YARN_LOG_DIR; chmod -R 755 $YARN_LOG_DIR;
where:
$YARN_LOG_DIR is the directory for storing the YARN logs.
This directory name is a combination of a directory and the $YARN_USER. For example,
/var/log/hadoop/yarn
, where yarn is the $YARN_USER.$YARN_USER is the user owning the YARN services. For example, yarn.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.
HDFS Process
At all nodes, execute the following commands:
mkdir -p $HDFS_PID_DIR; chown -R $HDFS_USER:$HADOOP_GROUP $HDFS_PID_DIR; chmod -R 755 $HDFS_PID_DIR;
where:
$HDFS_PID_DIR is the directory for storing the HDFS process ID.
This directory name is a combination of a directory and the $HDFS_USER. For example,
/var/run/hadoop/hdfs
where hdfs is the $HDFS_USER.$HDFS_USER is the user owning the HDFS services. For example, hdfs.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.
Yarn Process ID
At all nodes, execute the following commands:
mkdir -p $YARN_PID_DIR; chown -R $YARN_USER:$HADOOP_GROUP $YARN_PID_DIR; chmod -R 755 $YARN_PID_DIR;
where:
$YARN_PID_DIR is the directory for storing the YARN process ID.
This directory name is a combination of a directory and the $YARN_USER. For example,
/var/run/hadoop/yarn
where yarn is the $YARN_USER.$YARN_USER is the user owning the YARN services. For example, yarn.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.
JobHistory Server Logs
At all nodes, execute the following commands:
mkdir -p $MAPRED_LOG_DIR; chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_LOG_DIR; chmod -R 755 $MAPRED_LOG_DIR;
where:
$MAPRED_LOG_DIR is the directory for storing the JobHistory Server logs.
This directory name is a combination of a directory and the $MAPRED_USER. For example,
/var/log/hadoop/mapred
where mapred is the $MAPRED_USER.$MAPRED_USER is the user owning the MAPRED services. For example, mapred.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.
JobHistory Server Process ID
At all nodes, execute the following commands:
mkdir -p $MAPRED_PID_DIR; chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_PID_DIR; chmod -R 755 $MAPRED_PID_DIR;
where:
$MAPRED_PID_DIR is the directory for storing the JobHistory Server process ID.
This directory name is a combination of a directory and the $MAPRED_USER. For example,
/var/run/hadoop/mapred
where mapred is the $MAPRED_USER.$MAPRED_USER is the user owning the MAPRED services. For example, mapred.
$HADOOP_GROUP is a common group shared by services. For example, hadoop.
Symlink Directories with hdp-select
Important | |
---|---|
installs hdp-select automatically with the installation or upgrade of the first HDP component. |
To prevent version-specific directory issues for your scripts and updates, Hortonworks provides hdp-select, a script that symlinks directories to hdp-current and modifies paths for configuration directories.
Determine the version number of the hdp-select installed package:
yum list | grep hdp (on Cent OS6)
rpm –q -a | grep hdp (on Cent OS7)
dpkg -l | grep hdp (on Ubuntu)
For example:
/usr/bin/hdp-select set all 2.5.6.0-<$version>
Run hdp-select set all on the NameNode and on all DataNodes. If YARN is deployed separately, also run hdp-select on the Resource Manager and all Node Managers.
hdp-select set all 2.5.6.0-<$version>