Command Line Installation
Also available as:
PDF
loading table of contents...

Define Environment Parameters

You must set up specific users and directories for your HDP installation by using the following instructions:

  1. Define directories.

    The following table describes the directories you need for installation, configuration, data storage, process IDs, and log information based on the Apache Hadoop Services you plan to install. Use this table to define what you are going to use to set up your environment.

    [Note]Note

    The scripts.zip file that you downloaded in the supplied companion files includes a script, directories.sh, for setting directory environment parameters.

    You should edit and source (or copy the contents to your ~/.bash_profile) to set up these environment variables in your environment.

    Table 1.1. Directories Needed to Install Core Hadoop

    Hadoop Service

    Parameter

    Definition

    HDFS

    DFS_NAME_DIR

    Space separated list of directories to which NameNode should store the file system image: for example, /grid/hadoop/hdfs/nn /grid1/hadoop/hdfs/nn.

    HDFS

    DFS_DATA_DIR

    Space separated list of directories where DataNodes should store the blocks. For example, /grid/hadoop/hdfs/dn /grid1/hadoop/hdfs/dn /grid2/hadoop/hdfs/dn

    HDFS

    FS_CHECKPOINT_DIR

    Space separated list of directories where SecondaryNameNode should store the checkpoint image. For example, /grid/hadoop/hdfs/snn /grid1/hadoop/hdfs/snn /grid2/hadoop/hdfs/snn

    HDFS

    HDFS_LOG_DIR

    Directory for storing the HDFS logs. This directory name is a combination of a directory and the $HDFS_USER. For example, /var/log/hadoop/hdfs, where hdfs is the $HDFS_USER.

    HDFS

    HDFS_PID_DIR

    Directory for storing the HDFS process ID. This directory name is a combination of a directory and the $HDFS_USER. For example, /var/run/hadoop/hdfs, where hdfs is the $HDFS_USER.

    HDFS

    HADOOP_CONF_DIR

    Directory for storing the Hadoop configuration files. For example, /etc/hadoop/conf.

    YARN

    YARN_LOCAL_DIR

    Space-separated list of directories where YARN should store temporary data. For example, /grid/hadoop/yarn /grid1/hadoop/yarn /grid2/hadoop/yarn

    YARN

    YARN_LOG_DIR

    Directory for storing the YARN logs. For example, /var/log/hadoop/yarn. This directory name is a combination of a directory and the $YARN_USER. In the example yarn is the $YARN_USER.

    YARN

    YARN_LOCAL_LOG_DIR

    Space-separated list of directories where YARN stores container log data. For example, /grid/hadoop/yarn/logs /grid1/hadoop/yarn/log.

    YARN

    YARN_PID_DIR

    Directory for storing the YARN process ID. For example, /var/run/hadoop/yarn. This directory name is a combination of a directory and the $YARN_USER. In the example, yarn is the $YARN_USER.

    MapReduce

    MAPRED_LOG_DIR

    Directory for storing the JobHistory Server logs. For example, /var/log/hadoop/mapred. This directory name is a combination of a directory and the $MAPRED_USER. In the example, mapred is the $MAPRED_USER.


    Table 1.2. Directories Needed to Install Ecosystem Components

    Hadoop Service

    Parameter

    Definition

    Pig

    PIG_CONF_DIR

    Directory in which to store the Apache Pig configuration files: for example, /etc/pig/conf.

    Pig

    PIG_LOG_DIR

    Directory to store the Pig logs. For example, /var/log/pig.

    Pig

    PIG_PID_DIR

    Directory to store the Pig process ID. For example, /var/run/pig.

    Oozie

    OOZIE_CONF_DIR

    Directory to store the Oozie configuration files. For example, /etc/oozie/conf.

    Oozie

    OOZIE_DATA

    Directory to store the Oozie data. For example, /var/db/oozie.

    Oozie

    OOZIE_LOG_DIR

    Directory to store the Oozie logs. For example, /var/log/oozie.

    Oozie

    OOZIE_PID_DIR

    Directory to store the Oozie process ID. For example, /var/run/oozie.

    Oozie

    OOZIE_TMP_DIR

    Directory to store the Oozie temporary files. For example, /var/tmp/oozie.

    Hive

    HIVE_CONF_DIR

    Directory to store the Hive configuration files. For example, /etc/hive/conf.

    Hive

    HIVE_LOG_DIR

    Directory to store the Hive logs. For example, /var/log/hive.

    Hive

    HIVE_PID_DIR

    Directory to store the Hive process ID. For example, /var/run/hive.

    WebHCat

    WEBHCAT_CONF_DIR

    Directory to store the WebHCat configuration files. For example, /etc/hcatalog/conf/webhcat.

    WebHCat

    WEBHCAT_LOG_DIR

    Directory to store the WebHCat logs. For example, var/log/webhcat.

    WebHCat

    WEBHCAT_PID_DIR

    Directory to store the WebHCat process ID. For example, /var/run/webhcat.

    HBase

    HBASE_CONF_DIR

    Directory to store the Apache HBase configuration files. For example, /etc/hbase/conf.

    HBase

    HBASE_LOG_DIR

    Directory to store the HBase logs. For example, /var/log/hbase.

    HBase

    HBASE_PID_DIR

    Directory to store the HBase process ID. For example, /var/run/hbase.

    ZooKeeper

    ZOOKEEPER_DATA_DIR

    Directory where Apache ZooKeeper stores data. For example, /grid/hadoop/zookeeper/data

    ZooKeeper

    ZOOKEEPER_CONF_DIR

    Directory to store the ZooKeeper configuration files. For example, /etc/zookeeper/conf.

    ZooKeeper

    ZOOKEEPER_LOG_DIR

    Directory to store the ZooKeeper logs. For example, /var/log/zookeeper.

    ZooKeeper

    ZOOKEEPER_PID_DIR

    Directory to store the ZooKeeper process ID. For example, /var/run/zookeeper.

    Sqoop

    SQOOP_CONF_DIR

    Directory to store the Apache Sqoop configuration files. For example, /etc/sqoop/conf.


    If you use the companion files, the following screen provides a snapshot of how your directories.sh file should look after you edit the TODO variables:

    #!/bin/sh
    
    #
    # Directories Script
    #
    # 1. To use this script, you must edit the TODO variables below for your environment.
    #
    # 2. Warning: Leave the other parameters as the default values. Changing these default values requires you to
    # change values in other configuration files.
    #
    
    #
    # Hadoop Service - HDFS
    #
    
    # Space separated list of directories where NameNode stores file system image. For example, /grid/hadoop/hdfs/nn /grid1/hadoop/hdfs/nn
    DFS_NAME_DIR="TODO-LIST-OF-NAMENODE-DIRS";
    
    # Space separated list of directories where DataNodes stores the blocks. For example, /grid/hadoop/hdfs/dn /grid1/hadoop/hdfs/dn /grid2/hadoop/hdfs/dn
    DFS_DATA_DIR="TODO-LIST-OF-DATA-DIRS";
    
    # Space separated list of directories where SecondaryNameNode stores checkpoint image. For example, /grid/hadoop/hdfs/snn /grid1/hadoop/hdfs/snn /grid2/hadoop/hdfs/snn
    FS_CHECKPOINT_DIR="TODO-LIST-OF-SECONDARY-NAMENODE-DIRS";
    
    
    
    # Directory to store the HDFS logs.
    HDFS_LOG_DIR="/var/log/hadoop/hdfs";
    
    # Directory to store the HDFS process ID.
    HDFS_PID_DIR="/var/run/hadoop/hdfs";
    
    # Directory to store the Hadoop configuration files.
    HADOOP_CONF_DIR="/etc/hadoop/conf";
    
    #
    # Hadoop Service - YARN 
    #
    
    # Space separated list of directories where YARN stores temporary data. For example, /grid/hadoop/yarn/local /grid1/hadoop/yarn/local /grid2/hadoop/yarn/local
    YARN_LOCAL_DIR="TODO-LIST-OF-YARN-LOCAL-DIRS";
    
    # Directory to store the YARN logs.
    YARN_LOG_DIR="/var/log/hadoop/yarn"; 
    
    # Space separated list of directories where YARN stores container log data. For example, /grid/hadoop/yarn/logs /grid1/hadoop/yarn/logs /grid2/hadoop/yarn/logs
    YARN_LOCAL_LOG_DIR="TODO-LIST-OF-YARN-LOCAL-LOG-DIRS";
    
    # Directory to store the YARN process ID.
    YARN_PID_DIR="/var/run/hadoop/yarn";
    
    #
    # Hadoop Service - MAPREDUCE
    #
    
    # Directory to store the MapReduce daemon logs.
    MAPRED_LOG_DIR="/var/log/hadoop/mapred";
    
    # Directory to store the mapreduce jobhistory process ID.
    MAPRED_PID_DIR="/var/run/hadoop/mapred";
    
    #
    # Hadoop Service - Hive
    #
    
    # Directory to store the Hive configuration files.
    HIVE_CONF_DIR="/etc/hive/conf";
    
    # Directory to store the Hive logs.
    HIVE_LOG_DIR="/var/log/hive";
    
    # Directory to store the Hive process ID.
    HIVE_PID_DIR="/var/run/hive";
    
    #
    # Hadoop Service - WebHCat (Templeton)
    #
    
    # Directory to store the WebHCat (Templeton) configuration files.
    WEBHCAT_CONF_DIR="/etc/hcatalog/conf/webhcat";
    
    # Directory to store the WebHCat (Templeton) logs.
    WEBHCAT_LOG_DIR="var/log/webhcat";
    
    # Directory to store the WebHCat (Templeton) process ID.
    WEBHCAT_PID_DIR="/var/run/webhcat";
    
    #
    # Hadoop Service - HBase
    #
    
    # Directory to store the HBase configuration files.
    HBASE_CONF_DIR="/etc/hbase/conf";
    
    # Directory to store the HBase logs.
    HBASE_LOG_DIR="/var/log/hbase";
    
    # Directory to store the HBase logs.
    HBASE_PID_DIR="/var/run/hbase";
    
    #
    # Hadoop Service - ZooKeeper
    #
    
    # Directory where ZooKeeper stores data. For example, /grid1/hadoop/zookeeper/data
    ZOOKEEPER_DATA_DIR="TODO-ZOOKEEPER-DATA-DIR";
    
    # Directory to store the ZooKeeper configuration files.
    ZOOKEEPER_CONF_DIR="/etc/zookeeper/conf";
    
    # Directory to store the ZooKeeper logs.
    ZOOKEEPER_LOG_DIR="/var/log/zookeeper";
    
    # Directory to store the ZooKeeper process ID.
    ZOOKEEPER_PID_DIR="/var/run/zookeeper";
    
    #
    # Hadoop Service - Pig
    #
    
    # Directory to store the Pig configuration files.
    PIG_CONF_DIR="/etc/pig/conf";
    
    # Directory to store the Pig logs.
    PIG_LOG_DIR="/var/log/pig";
    
    # Directory to store the Pig process ID.
    PIG_PID_DIR="/var/run/pig";
    
    
    #
    # Hadoop Service - Oozie
    #
    
    # Directory to store the Oozie configuration files.
    OOZIE_CONF_DIR="/etc/oozie/conf"
    
    # Directory to store the Oozie data.
    OOZIE_DATA="/var/db/oozie"
    
    # Directory to store the Oozie logs.
    OOZIE_LOG_DIR="/var/log/oozie"
    
    # Directory to store the Oozie process ID.
    OOZIE_PID_DIR="/var/run/oozie"
    
    # Directory to store the Oozie temporary files.
    OOZIE_TMP_DIR="/var/tmp/oozie"
    
    #
    # Hadoop Service - Sqoop
    #
    SQOOP_CONF_DIR="/etc/sqoop/conf"
    
    #
    # Hadoop Service - Accumulo
    #
    ACCUMULO_CONF_DIR="/etc/accumulo/conf";
    
    ACCUMULO_LOG_DIR="/var/log/accumulo"
  2. The following table describes system user account and groups. Use this table to define what you are going to use in setting up your environment. These users and groups should reflect the accounts you create in Create System Users and Groups. The scripts.zip file you downloaded includes a script, usersAndGroups.sh, for setting user and group environment parameters.

    Table 1.3. Define Users and Groups for Systems

    Parameter

    Definition

    HDFS_USER

    User that owns the Hadoop Distributed File Sysem (HDFS) services. For example, hdfs.

    YARN_USER

    User that owns the YARN services. For example, yarn.

    ZOOKEEPER_USER

    User that owns the ZooKeeper services. For example, zookeeper.

    HIVE_USER

    User that owns the Hive services. For example, hive.

    WEBHCAT_USER

    User that owns the WebHCat services. For example, hcat.

    HBASE_USER

    User that owns the HBase services. For example, hbase.

    FALCON_USER

    User that owns the Apache Falcon services. For example, falcon.

    SQOOP_USER

    User owning the Sqoop services. For example, sqoop.

    KAFKA_USER

    User owning the Apache Kafka services. For example, kafka.

    OOZIE_USER

    User owning the Oozie services. For example oozie.

    STORM_USER

    User owning the Storm Services. For example, storm.

    HADOOP_GROUP

    A common group shared by services. For example, hadoop.

    ACCUMULO_USER

    User that owns the Accumulo services. For example, accumulo.

    KNOX_USER

    User that owns the Knox Gateway services. For example, knox.

    NAGIOS_USER

    User that owns the Nagios services. For example, nagios.