Configure YARN and MapReduce
After you upgrade Hadoop, complete the following steps to update your configs.
Note | |
---|---|
The |
Important | |
---|---|
If you have a secure server, you will need Kerberos credentials for hdfs user access. |
Upload the MapReduce tarball to HDFS. As the HDFS user, for example 'hdfs':
su - hdfs -c "hdfs dfs -mkdir -p /hdp/apps/2.3.0.0-<$version>/mapreduce/"
su - hdfs -c "hdfs dfs -put /usr/hdp/2.3.0.0-<$version>/hadoop/mapreduce.tar.gz /hdp/apps/2.3.0.0-<$version>/mapreduce/"
su - hdfs -c "hdfs dfs -chown -R hdfs:hadoop /hdp"
su - hdfs -c "hdfs dfs -chmod -R 555 /hdp/apps/2.3.0.0-<$version>/mapreduce"
su - hdfs -c "hdfs dfs -chmod -R 444 /hdp/apps/2.3.0.0-<$version>/mapreduce/mapreduce.tar.gz"
Make the following changes to
/etc/hadoop/conf/mapred-site.xml:
Add:
<property> <name>mapreduce.application.framework.path</name> <value>/hdp/apps/${hdp.version} /mapreduce/mapreduce.tar.gz#mr-framework </value> </property> <property> <name>yarn.app.mapreduce.am.admin-comand-opts</name> <value>Dhdp.version=${hdp.version}</value> </property>
Note You do not need to modify ${hdp.version}.
Modify the following existing properties to include ${hdp.version}:
<property> <name>mapreduce.admin.user.env</name> <value>LD_LIBRARY_PATH=/usr/hdp/${hdp.version} /hadoop/lib/native:/usr/hdp/${hdp.version}/hadoop/ lib/native/Linux-amd64-64 </value> </property> <property> <name>mapreduce.admin.map.child.java.opts</name> <value>-server -Djava.net.preferIPv4Stack=true -Dhdp.version=${hdp.version} </value> <final>true</final> </property> <property> <name>mapreduce.admin.reduce.child.java.opts</name> <value>-server -Djava.net.preferIPv4Stack=true -Dhdp.version=${hdp.version}</value> <final>true</final> </property> <property> <name>mapreduce.application.classpath</name> <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*, $PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*, $PWD/mr-framework/hadoop/share/hadoop/common/*, $PWD/mr-framework/hadoop/share/hadoop/common/lib/*, $PWD/mr-framework/hadoop/share/hadoop/yarn/*, $PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*, $PWD/mr-framework/hadoop/share/hadoop/hdfs/*, $PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*, /usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar, /etc/hadoop/conf/secure</value> </property>
Note You do not need to modify ${hdp.version}.
Remove the following properties from
/etc/hadoop/conf/mapred-site.xml
:mapreduce.task.tmp.dir
,mapreduce.job.speculative.slownodethreshold
(deprecated), andmapreduce.job.speculative.speculativecap
(deprecated).
Add the following properties to
/etc/hadoop/conf/yarn-site.xml:
<property> <name>yarn.application.classpath</name> <value>$HADOOP_CONF_DIR,/usr/hdp/${hdp.version}/hadoop-client/*, /usr/hdp/${hdp.version}/hadoop-client/lib/*, /usr/hdp/${hdp.version}/hadoop-hdfs-client/*, /usr/hdp/${hdp.version}/hadoop-hdfs-client/lib/*, /usr/hdp/${hdp.version}/hadoop-yarn-client/*, /usr/hdp/${hdp.version}/hadoop-yarn-client/lib/*</value> </property>
On secure clusters only, add the following properties to
/etc/hadoop/conf/yarn-site.xml:
<property> <name>yarn.timeline-service.recovery.enabled:</name> <value>TRUE</value> </property> <property> <name>yarn.timeline-service.state-store.class: org.apache.hadoop.yarn.server.timeline.recovery:</name> <value>LeveldbTimelineStateStore</value> </property> <property> <name>yarn.timeline-service.leveldb-state-store.path:</name> <value><the same as the default of "yarn.timeline-service-leveldb-timeline-store.path</value> </property>
Modify the following property to
/etc/hadoop/conf/yarn-site.xml:
<property> <name>mapreduce.application.classpath</name> <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*, $PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*, $PWD/mr-framework/hadoop/share/hadoop/common/*, $PWD/mr-framework/hadoop/share/hadoop/common/lib/*, $PWD/mr-framework/hadoop/share/hadoop/yarn/*, $PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*, $PWD/mr-framework/hadoop/share/hadoop/hdfs/*, $PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*, $PWD/mr-framework/hadoop/share/hadoop/share/hadoop/tools/lib/*, /usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar, /etc/hadoop/conf/secure</value> </property>
Make the following change to the
/etc/hadoop/conf/yarn-env.sh:
Change
export HADOOP_YARN_HOME=/usr/lib/hadoop-yarn
to
export HADOOP_YARN_HOME=/usr/hdp/current/hadoop-yarn-nodemanager/
Make the following change to the
/etc/hadoop/conf/yarn-env.sh:
Change
export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec
to
HADOOP_LIBEXEC_DIR=/usr/hdp/current/hadoop-client/libexec/
For secure clusters, you must create and configure the container-executor.cfg configuration file:
Create the container-executor.cfg file in /etc/hadoop/conf/.
Insert the following properties:
yarn.nodemanager.linux-container-executor.group=hadoop banned.users=hdfs,yarn,mapred min.user.id=1000
yarn.nodemanager.linux-container-executor.group - Configured value of yarn.nodemanager.linux-container-executor.group. This must match the value of yarn.nodemanager.linux-container-executor.group in yarn-site.xml.
banned.users - Comma-separated list of users who can not run container-executor.
min.user.id - Minimum value of user id. This prevents system users from running container-executor.
allowed.system.users - Comma-separated list of allowed system users.
Set the file /etc/hadoop/conf/container-executor.cfg file permissions to only be readable by root:
chown root:hadoop /etc/hadoop/conf/container-executor.cfg chmod 400 /etc/hadoop/conf/container-executor.cfg
Set the container-executor program so that only root or hadoop group users can run it:
chown root:hadoop /usr/hdp/${hdp.version}/hadoop-yarn-server-nodemanager/bin/container-executor chmod 6050 /usr/hdp/${hdp.version}/hadoop-yarn-server-nodemanager/bin/container-executor