This section describes how to set up and edit the deployment configuration files for HDFS and MapReduce.
Use the following instructions to set up Hadoop configuration files:
We strongly suggest that you edit and source the bash script files included with the companion files (downloaded in Download Companion Files).
Alternatively, you can also copy the contents to your
~/.bash_profile
) to set up these environment variables in your environment.Extract the core Hadoop configuration files to a temporary directory.
The files are located in the
configuration_files/core_hadoop
directory where you decompressed the companion files.Modify the configuration files.
In the temporary directory, locate the following files and modify the properties based on your environment.
Search for
TODO
in the files for the properties to replace. See Define Environment Parameters for more information.Edit the
core-site.xml
and modify the following properties:<property> <name>fs.defaultFS</name> <value>hdfs://$namenode.full.hostname:8020</value> <description>Enter your NameNode hostname</description> </property>
Edit the
hdfs-site.xml
and modify the following properties:<property> <name>dfs.namenode.name.dir</name> <value>/grid/hadoop/hdfs/nn,/grid1/hadoop/hdfs/nn</value> <description>Comma separated list of paths. Use the list of directories from $DFS_NAME_DIR. For example, /grid/hadoop/hdfs/nn,/grid1/hadoop/hdfs/nn.</description> </property>
<property> <name>dfs.datanode.data.dir</name> <value>file:///grid/hadoop/hdfs/dn, file:///grid1/hadoop/hdfs/dn</value> <description>Comma separated list of paths. Use the list of directories from $DFS_DATA_DIR. For example, file:///grid/hadoop/hdfs/dn, file:///grid1/hadoop/hdfs/dn.</description> </property>
<property> <name>dfs.namenode.http-address</name> <value>$namenode.full.hostname:50070</value> <description>Enter your NameNode hostname for http access.</description> </property>
<property> <name>dfs.namenode.secondary.http-address</name> <value>$secondary.namenode.full.hostname:50090</value> <description>Enter your Secondary NameNode hostname.</description> </property>
<property> <name>dfs.namenode.checkpoint.dir</name> <value>/grid/hadoop/hdfs/snn,/grid1/hadoop/hdfs/snn,/grid2/hadoop/hdfs/snn</value> <description>A comma separated list of paths. Use the list of directories from $FS_CHECKPOINT_DIR. For example, /grid/hadoop/hdfs/snn,sbr/grid1/hadoop/hdfs/snn,sbr/grid2/hadoop/hdfs/snn </description> </property>
Note The value of NameNode new generation size should be 1/8 of maximum heap size (
-Xmx
). Ensure that you check the default setting for your environment.To change the default value:
Edit the
/etc/hadoop/conf/hadoop-env.sh
file.Change the value of the
-XX:MaxnewSize
parameter to 1/8th the value of the maximum heap size (-Xmx
) parameter.
Edit the
yarn-site.xml
and modify the following properties:<property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> </property>
<property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>$resourcemanager.full.hostname:8025</value> <description>Enter your ResourceManager hostname.</description> </property>
<property> <name>yarn.resourcemanager.scheduler.address</name> <value>$resourcemanager.full.hostname:8030</value> <description>Enter your ResourceManager hostname.</description> </property>
<property> <name>yarn.resourcemanager.address</name> <value>$resourcemanager.full.hostname:8050</value> <description>Enter your ResourceManager hostname.</description> </property>
<property> <name>yarn.resourcemanager.admin.address</name> <value>$resourcemanager.full.hostname:8141</value> <description>Enter your ResourceManager hostname.</description> </property>
<property> <name>yarn.nodemanager.local-dirs</name> <value>/grid/hadoop/yarn/local,/grid1/hadoop/yarn/local</value> <description>Comma separated list of paths. Use the list of directories from $YARN_LOCAL_DIR. For example, /grid/hadoop/yarn/local,/grid1/hadoop/yarn/local.</description> </property>
<property> <name>yarn.nodemanager.log-dirs</name> <value>/grid/hadoop/yarn/log</value> <description>Use the list of directories from $YARN_LOCAL_LOG_DIR. For example, /grid/hadoop/yarn/log,/grid1/hadoop/yarn/log,/grid2/hadoop/yarn/log</description> </property>
<property> <name>yarn.log.server.url</name> <value>http://$jobhistoryserver.full.hostname:19888/jobhistory/logs/</value> <description>URL for job history server</description> </property>
<property> <name>yarn.resourcemanager.webapp.address</name> <value>$resourcemanager.full.hostname:8088</value> <description>URL for job history server</description> </property>
Edit the
mapred-site.xml
and modify the following properties:<property> <name>mapreduce.jobhistory.address</name> <value>$jobhistoryserver.full.hostname:10020</value> <description>Enter your JobHistoryServer hostname.</description> </property>
<property> <name>mapreduce.jobhistory.webapp.address</name> <value>$jobhistoryserver.full.hostname:19888</value> <description>Enter your JobHistoryServer hostname.</description> </property>
Optional: Configure MapReduce to use Snappy Compression
In order to enable Snappy compression for MapReduce jobs, edit core-site.xml and mapred-site.xml.
Add the following properties to
mapred-site.xml
:<property> <name>mapreduce.admin.map.child.java.opts</name> <value>-server -XX:NewRatio=8 -Djava.library.path=/usr/lib/hadoop/lib/native/ -Djava.net.preferIPv4Stack=true</value> <final>true</final> </property> <property> <name>mapreduce.admin.reduce.child.java.opts</name> <value>-server -XX:NewRatio=8 -Djava.library.path=/usr/lib/hadoop/lib/native/ -Djava.net.preferIPv4Stack=true</value> <final>true</final> </property>
Add the SnappyCodec to the codecs list in core-site.xml:
<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec</value> </property>
Optional: Replace the default memory configuration settings in
yarn-site.xml
andmapred-site.xml
with the YARN and MapReduce memory configuration settings you calculated previously.Copy the configuration files.
On all hosts in your cluster, create the Hadoop configuration directory:
rm -r $HADOOP_CONF_DIR mkdir -p $HADOOP_CONF_DIR
where
$HADOOP_CONF_DIR
is the directory for storing the Hadoop configuration files.For example,
/etc/hadoop/conf
.Copy all the configuration files to
$HADOOP_CONF_DIR
.Set appropriate permissions:
chown -R $HDFS_USER:$HADOOP_GROUP $HADOOP_CONF_DIR/../ chmod -R 755 $HADOOP_CONF_DIR/../
where:
$HDFS_USER
is the user owning the HDFS services. For example,hdfs
.$HADOOP_GROUP
is a common group shared by services. For example,hadoop
.