9. Define Environment Parameters

You need to set up specific users and directories for your HDP installation using the following instructions:

Define directories.

The following table describes the directories for install, configuration, data, process IDs and logs based on the Hadoop Services you plan to install. Use this table to define what you are going to use in setting up your environment.

Note

	Note
The `scripts.zip` file you downloaded in Download Companion Files includes a script, `directories.sh,` for setting directory environment parameters. We strongly suggest you edit and source (alternatively, you can also copy the contents to your `~/.bash_profile`) to set up these environment variables in your environment.

The scripts.zip file you downloaded in Download Companion Files includes a script, directories.sh, for setting directory environment parameters.

We strongly suggest you edit and source (alternatively, you can also copy the contents to your ~/.bash_profile) to set up these environment variables in your environment.

Table 1.1. Define Directories for Core Hadoop

Hadoop Service	Parameter	Definition
HDFS	`DFS_NAME_DIR`	Space separated list of directories where NameNode should store the file system image. For example, `/grid/hadoop/hdfs/nn` `/grid1/hadoop/hdfs/nn`
HDFS	`DFS_DATA_DIR`	Space separated list of directories where DataNodes should store the blocks. For example, `/grid/hadoop/hdfs/dn` `/grid1/hadoop/hdfs/dn` `/grid2/hadoop/hdfs/dn`
HDFS	`FS_CHECKPOINT_DIR`	Space separated list of directories where SecondaryNameNode should store the checkpoint image. For example, `/grid/hadoop/hdfs/snn` `/grid1/hadoop/hdfs/snn` `/grid2/hadoop/hdfs/snn`
HDFS	`HDFS_LOG_DIR`	Directory for storing the HDFS logs. This directory name is a combination of a directory and the `$HDFS_USER`. For example, `/var/log/hadoop/hdfs` where `hdfs` is the `$HDFS_USER`.
HDFS	`HDFS_PID_DIR`	Directory for storing the HDFS process ID. This directory name is a combination of a directory and the `$HDFS_USER`. For example, `/var/run/hadoop/hdfs` where `hdfs` is the `$HDFS_USER`
HDFS	`HADOOP_CONF_DIR`	Directory for storing the Hadoop configuration files. For example, `/etc/hadoop/conf`
YARN	`YARN_LOCAL_DIR`	Space separated list of directories where YARN should store temporary data. For example, `/grid/hadoop/yarn` `/grid1/hadoop/yarn` `/grid2/hadoop/yarn`.
YARN	`YARN_LOG_DIR`	Directory for storing the YARN logs. For example, `/var/log/hadoop/yarn`. This directory name is a combination of a directory and the `$YARN_USER`. In the example `yarn` is the `$YARN_USER`.
YARN	`YARN_PID_DIR`	Directory for storing the YARN process ID. For example, `/var/run/hadoop/yarn`. This directory name is a combination of a directory and the `$YARN_USER`. In the example, `yarn` is the `$YARN_USER`.
MapReduce	`MAPRED_LOG_DIR`	Directory for storing the JobHistory Server logs. For example, `/var/log/hadoop/mapred`. This directory name is a combination of a directory and the `$MAPRED_USER`. In the example `mapred` is the `$MAPRED_USER`

Table 1.2. Define Directories for Ecosystem Components

Hadoop Service	Parameter	Definition
Pig	`PIG_CONF_DIR`	Directory to store the Pig configuration files. For example, `/etc/pig/conf`.
Pig	`PIG_LOG_DIR`	Directory to store the Pig logs. For example, `/var/log/pig`.
Pig	`PIG_PID_DIR`	Directory to store the Pig process ID. For example, `/var/run/pig`.
Oozie	`OOZIE_CONF_DIR`	Directory to store the Oozie configuration files. For example, `/etc/oozie/conf`.
Oozie	`OOZIE_DATA`	Directory to store the Oozie data. For example, `/var/db/oozie`.
Oozie	`OOZIE_LOG_DIR`	Directory to store the Oozie logs. For example, `/var/log/oozie`.
Oozie	`OOZIE_PID_DIR`	Directory to store the Oozie process ID. For example, `/var/run/oozie`.
Oozie	`OOZIE_TMP_DIR`	Directory to store the Oozie temporary files. For example, `/var/tmp/oozie`.
Hive	`HIVE_CONF_DIR`	Directory to store the Hive configuration files. For example, `/etc/hive/conf`.
Hive	`HIVE_LOG_DIR`	Directory to store the Hive logs. For example, `/var/log/hive`.
Hive	`HIVE_PID_DIR`	Directory to store the Hive process ID. For example, `/var/run/hive`.
WebHCat	`WEBHCAT_CONF_DIR`	Directory to store the WebHCat configuration files. For example, `/etc/hcatalog/conf/webhcat`.
WebHCat	`WEBHCAT_LOG_DIR`	Directory to store the WebHCat logs. For example, `var/log/webhcat`.
WebHCat	`WEBHCAT_PID_DIR`	Directory to store the WebHCat process ID. For example, `/var/run/webhcat`.
HBase	`HBASE_CONF_DIR`	Directory to store the HBase configuration files. For example, `/etc/hbase/conf`.
HBase	`HBASE_LOG_DIR`	Directory to store the HBase logs. For example, `/var/log/hbase`.
HBase	`HBASE_PID_DIR`	Directory to store the HBase process ID. For example, `/var/run/hbase`.
ZooKeeper	`ZOOKEEPER_DATA_DIR`	Directory where ZooKeeper will store data. For example, `/grid/hadoop/zookeeper/data`
ZooKeeper	`ZOOKEEPER_CONF_DIR`	Directory to store the ZooKeeper configuration files. For example, `/etc/zookeeper/conf`.
ZooKeeper	`ZOOKEEPER_LOG_DIR`	Directory to store the ZooKeeper logs. For example, `/var/log/zookeeper`.
ZooKeeper	`ZOOKEEPER_PID_DIR`	Directory to store the ZooKeeper process ID. For example, `/var/run/zookeeper`.
Sqoop	`SQOOP_CONF_DIR`	Directory to store the Sqoop configuration files. For example, `/usr/lib/sqoop/conf`.

If you use the Companion files, the following provides a snapshot of how your directories.sh file should look after you edit the TODO variables:

#!/bin/sh

#
# Directories Script
#
# 1. To use this script, you must edit the TODO variables below for your environment.
#
# 2. Warning: Leave the other parameters as the default values. Changing these default values will require you to
# change values in other configuration files.
#

#
# Hadoop Service - HDFS
#

# Space separated list of directories where NameNode will store file system image. For example, /grid/hadoop/hdfs/nn /grid1/hadoop/hdfs/nn
DFS_NAME_DIR="/grid/0/hadoop/hdfs/nn";

# Space separated list of directories where DataNodes will store the blocks. For example, /grid/hadoop/hdfs/dn /grid1/hadoop/hdfs/dn /grid2/hadoop/hdfs/dn
DFS_DATA_DIR="/grid/0/hadoop/hdfs/dn";

# Space separated list of directories where SecondaryNameNode will store checkpoint image. For example, /grid/hadoop/hdfs/snn /grid1/hadoop/hdfs/snn /grid2/hadoop/hdfs/snn
FS_CHECKPOINT_DIR="/grid/0/hadoop/hdfs/snn";



# Directory to store the HDFS logs.
HDFS_LOG_DIR="/var/log/hadoop/hdfs";

# Directory to store the HDFS process ID.
HDFS_PID_DIR="/var/run/hadoop/hdfs";

# Directory to store the Hadoop configuration files.
HADOOP_CONF_DIR="/etc/hadoop/conf";

#
# Hadoop Service - YARN
#

# Space separated list of directories where YARN will store temporary data. For example, /grid/hadoop/yarn/local /grid1/hadoop/yarn/local /grid2/hadoop/yarn/local
YARN_LOCAL_DIR="/grid/0/hadoop/yarn/local";

# Directory to store the YARN logs.
YARN_LOG_DIR="/var/log/hadoop/yarn";

# Space separated list of directories where YARN will store container log data. For example, /grid/hadoop/yarn/logs /grid1/hadoop/yarn/logs /grid2/hadoop/yarn/logs
YARN_LOCAL_LOG_DIR="/grid/0/hadoop/yarn/logs";

# Directory to store the YARN process ID.
YARN_PID_DIR="/var/run/hadoop/yarn";

#
# Hadoop Service - MAPREDUCE
#

# Directory to store the MapReduce daemon logs.
MAPRED_LOG_DIR="/var/log/hadoop/mapreduce";

# Directory to store the mapreduce jobhistory process ID.
MAPRED_PID_DIR="/var/run/hadoop/mapreduce";

#
# Hadoop Service - Hive
#

# Directory to store the Hive configuration files.
HIVE_CONF_DIR="/etc/hive/conf";

# Directory to store the Hive logs.
HIVE_LOG_DIR="/var/log/hive";

# Directory to store the Hive process ID.
HIVE_PID_DIR="/var/run/hive";

#
# Hadoop Service - WebHCat (Templeton)
#

# Directory to store the WebHCat (Templeton) configuration files.
WEBHCAT_CONF_DIR="/etc/hcatalog/conf/webhcat";

# Directory to store the WebHCat (Templeton) logs.
WEBHCAT_LOG_DIR="var/log/webhcat";

# Directory to store the WebHCat (Templeton) process ID.
WEBHCAT_PID_DIR="/var/run/webhcat";

#
# Hadoop Service - HBase
#

# Directory to store the HBase configuration files.
HBASE_CONF_DIR="/etc/hbase/conf";

# Directory to store the HBase logs.
HBASE_LOG_DIR="/var/log/hbase";

# Directory to store the HBase logs.
HBASE_PID_DIR="/var/run/hbase";

#
# Hadoop Service - ZooKeeper
#

# Directory where ZooKeeper will store data. For example, /grid1/hadoop/zookeeper/data
ZOOKEEPER_DATA_DIR="../hadoop/zookeeper/data";

# Directory to store the ZooKeeper configuration files.
ZOOKEEPER_CONF_DIR="/etc/zookeeper/conf";

# Directory to store the ZooKeeper logs.
ZOOKEEPER_LOG_DIR="/var/log/zookeeper";

# Directory to store the ZooKeeper process ID.
ZOOKEEPER_PID_DIR="/var/run/zookeeper";

#
# Hadoop Service - Pig
#

# Directory to store the Pig configuration files.
PIG_CONF_DIR="/etc/pig/conf";

# Directory to store the Pig logs.
PIG_LOG_DIR="/var/log/pig";

# Directory to store the Pig process ID.
PIG_PID_DIR="/var/run/pig";


#
# Hadoop Service - Oozie
#

# Directory to store the Oozie configuration files.
OOZIE_CONF_DIR="/etc/oozie/conf"

# Directory to store the Oozie data.
OOZIE_DATA="/var/db/oozie"

# Directory to store the Oozie logs.
OOZIE_LOG_DIR="/var/log/oozie"

# Directory to store the Oozie process ID.
OOZIE_PID_DIR="/var/run/oozie"

# Directory to store the Oozie temporary files.
OOZIE_TMP_DIR="/var/tmp/oozie"

#
# Hadoop Service - Sqoop
#
SQOOP_CONF_DIR="/etc/sqoop/conf"

export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec

Define users and groups:

The following table describes system user account and groups. Use this table to define what you are going to use in setting up your environment. These users and groups should reflect the accounts you created in Create System Users and Groups.

Note

	Note
The `scripts.zip` file you downloaded in Download Companion Files includes a script, `usersAndGroups.sh,` for setting user and group environment parameters. We strongly suggest you edit and source (alternatively, you can also copy the contents to your `~/.bash_profile`) to set up these environment variables in your environment.

The scripts.zip file you downloaded in Download Companion Files includes a script, usersAndGroups.sh, for setting user and group environment parameters.

We strongly suggest you edit and source (alternatively, you can also copy the contents to your ~/.bash_profile) to set up these environment variables in your environment.

Table 1.3. Define Users and Groups for Systems

Parameter	Definition
`HDFS_USER`	User owning the HDFS services. For example, `hdfs`.
`YARN_USER`	User owning the YARN services. For example, `yarn`.
`ZOOKEEPER_USER`	User owning the ZooKeeper services. For example, `zookeeper`.
`HIVE_USER`	User owning the Hive services. For example, `hive`.
`WEBHCAT_USER`	User owning the WebHCat services. For example, `hcat`.
`HBASE_USER`	User owning the HBase services. For example, `hbase`.
`PIG_USER`	User owning the Pig services. For example, `pig`.
`HADOOP_GROUP`	A common group shared by services. For example, `hadoop`.

Legal notices