10.2. Define Directories

The following table describes the directories for install, configuration, data, process IDs, and logs based on the Hadoop Services you plan to install. Use this table to define what you are going to use in setting up your environment.

[Note]Note

The scripts.zip file you downloaded in Download Companion Files includes a script, directories.sh, for setting directory environment parameters. We strongly suggest you edit and source (alternatively, you can also copy the contents to your ~/.bash_profile) to set up these environment variables in your environment.

 

Table 1.3. Define Directories for Core Hadoop

Hadoop ServiceParameterDefinition

HDFS

DFS_NAME_DIR

Space separated list of directories where NameNode should store the file system image.

For example,

/grid/hadoop/hdfs/nn

/grid1/hadoop/hdfs/nn

HDFS

DFS_DATA_DIR

Space separated list of directories where DataNodes should store the blocks.

For example,

/grid/hadoop/hdfs/dn

/grid1/hadoop/hdfs/dn

/grid2/hadoop/hdfs/dn

HDFS

FS_CHECKPOINT_DIR

Space separated list of directories where SecondaryNameNode should store the checkpoint image.

For example,

/grid/hadoop/hdfs/snn

/grid1/hadoop/hdfs/snn

/grid2/hadoop/hdfs/snn

HDFS

HDFS_LOG_DIR

Directory for storing the HDFS logs. This directory name is a combination of a directory and the $HDFS_USER.

For example, /var/log/hadoop/hdfs

where hdfs is the $HDFS_USER

HDFS

HDFS_PID_DIR

Directory for storing the HDFS process ID. This directory name is a combination of a directory and the $HDFS_USER.

For example, /var/run/hadoop/hdfs

where hdfs is the $HDFS_USER

HDFS

HADOOP_CONF_DIR

Directory for storing the Hadoop configuration files.

For example, /etc/hadoop/conf

MapReduce

MAPREDUCE_LOCAL_DIR

Space separated list of directories where MapReduce should store temporary data.

For example,

/grid/hadoop/mapred

/grid1/hadoop/mapred

/grid2/hadoop/mapred

MapReduce

MAPRED_LOG_DIR

Directory for storing the HDFS logs.

For example, /var/log/hadoop/mapred

This directory name is a combination of a directory and the $MAPRED_USER. In the example mapred is the $MAPRED_USER

MapReduce

MAPRED_PID_DIR

Directory for storing the MapReduce pro­cess ID.

For example, /var/run/hadoop/mapred

This directory name is a combination of a directory and the $MAPRED_USER. In the example, mapred is the $MAPRED_USER.


 

Table 1.4. Define Directories for Ecosystem Components

Hadoop ServiceParameterDefinition

Pig

PIG_CONF_DIR

Directory to store the Pig configuration files. For example, /etc/pig/conf

Oozie

OOZIE_CONF_DIR

Directory to store the Oozie configuration files. For example, /etc/oozie/conf

Oozie

OOZIE_DATA

Directory to store the Oozie data. For example, /var/db/oozie

Oozie

OOZIE_LOG_DIR

Directory to store the Oozie logs. For example, /var/log/oozie

Oozie

OOZIE_PID_DIR

Directory to store the Oozie process ID. For example, /var/run/oozie

Oozie

OOZIE_TMP_DIR

Directory to store the Oozie temporary files. For example, /var/tmp/oozie

Hive

HIVE_CONF_DIR

Directory to store the Hive configuration files. For example, /etc/hive/conf

Hive

HIVE_LOG_DIR

Directory to store the Hive logs. For example, /var/log/hive

Hive

HIVE_PID_DIR

Directory to store the Hive process ID. For example, /var/run/hive

WebHCat

WEBHCAT_CONF_DIR

Directory to store the WebHCat configuration files. For example, /etc/hcatalog/conf/webhcat

WebHCat

WEBHCAT_LOG_DIR

Directory to store the WebHCat logs. For example, /grid/0/var/log/webhcat/webhcat

WebHCat

WEBHCAT_PID_DIR

Directory to store the WebHCat process ID. For example, /var/run/webhcat

HBase

HBASE_CONF_DIR

Directory to store the HBase configuration files. For example, /etc/hbase/conf

HBase

HBASE_LOG_DIR

Directory to store the HBase logs. For example, /var/log/hbase

HBase

HBASE_PID_DIR

Directory to store the HBase process ID. For example, /var/run/hbase

ZooKeeper

ZOOKEEPER_DATA_DIR

Directory where ZooKeeper will store data. For example, /grid1/hadoop/zookeeper/data

ZooKeeper

ZOOKEEPER_CONF_DIR

Directory to store the ZooKeeper configu­ration files. For example, /etc/zookeeper/conf

ZooKeeper

ZOOKEEPER_LOG_DIR

Directory to store the ZooKeeper logs. For example, /var/log/zookeeper

ZooKeeper

ZOOKEEPER_PID_DIR

Directory to store the ZooKeeper process ID. For example, /var/run/zookeeper

Sqoop

SQOOP_CONF_DIR

Directory to store the Sqoop configuration files. For example, /usr/lib/sqoop/conf



loading table of contents...