These files are used to configure MapReduce jobs.
Note | |
---|---|
Default paths of the files are provided as they are from an HDP install. Users may choose to change these locations as per need. |
/etc/hadoop/conf/yarn-site.xml
This file contains configuration settings for YARN. It is used by the Client, the Node Manager, and the Resource Manager. The following table lists some important
yarn-site.xml
properties.Property Value Description yarn.resourcemanager.webapp.address
<RM_HOST>:8088 Resource Manager host and port address. yarn.log.server.url
<H_S>:19888/jobhistory/logs History server address. yarn.resourcemanager.hostname
<RM_HOST> Resource Manager host name. yarn.nodemanager.linux-container-executor.group
hadoop Equivalent of the v1 Task Tracker controller group which can run Ams. yarn.nodemanager.log.retain-second
604800 The unit is in seconds. yarn.log-aggregation-enable
true Aggregate all of the logs in one location. yarn.nodemanager.container-monitor.interval-ms
3000 Specifies that Containers must send a heartbeat to the Node Manager every 3 seconds. yarn.nodemanager.log-aggregation.compression-type
gz Specifies that log files will be compressed in Gz format. /etc/hadoop/conf/core-site.xml
This file contains configuration settings for Hadoop Core, such as I/O settings that are common to HDFS2 and MRv2. It is used by all Hadoop daemons and clients, because all daemons need to know the location of the Name Node. Hence this file should have a copy in each node running a Hadoop daemon or client.
/etc/hadoop/conf/mapred-site.xml
This file contains configuration settings for MRv2 properties such as
io.sort
and memory settings for the Containers. The following table lists some important mapred-site.xml properties.Property Value Description mapreduce.map.memory.mb
1024 #1 Over all heap for the Mappers task mapreduce.reduce.memory.mb
1024 #2 Over all heap for the Reducers task mapreduce.map.java.opts
-Xmx756m The heapsize of the jvm –Xmx for the mapper task .8 of #1 mapreduce.reduce.java.opts
-Xmx756m The heapsize of the jvm –Xmx for the reducer task .8 of #2 mapreduce.reduce.log.level
INFO log4j log level variables supported mapreduce.jobhistory.done-dir
/mr-history/done The location is in Hdfs mapreduce.shuffle.port
13562 Ensure that it is open by firewall yarn.app.mapreduce.am.staging-dir
/user The location is in Hdfs mapreduce.reduce.shuffle.parallelcopies
30 Scale this for a huge cluster mapreduce.framework.name
yarn Basic configuration /etc/hadoop/conf/capacity-scheduler.xml
This is the configuration file for the Capacity Scheduler component in the Hadoop Resource Manager. You can use this file to configure various scheduling parameters related to queues.
/etc/hadoop/conf/hadoop-env.sh
Java is required by Hadoop, so this file is used by the HDFS daemons to locate
JAVA_HOME
. This file also specifies memory settings for all of the HDFS daemons. This is the file to use if you need to tweak memory settings for the HDFS daemons. This file might also be investigated when dealing with memory errors with the HDFS daemons. This file is useful for memory issues and garbage collector issues./etc/hadoop/conf/yarn-env.sh
Java is required by Hadoop, so this file is used by the YARN daemons to locate
JAVA_HOME
. This file also specifies memory settings for all of the YARN daemons. This is the file to use if you need to tweak memory settings for the YARN daemons. This file might also be investigated when dealing with memory errors with the YARN daemons. This file is also useful for memory issues and garbage collector issues./etc/hadoop/conf/log4j.properties
This file is used to modify the log purging intervals of the MapReduce log files. It defines the logging for all of the Hadoop daemons, and includes information related to appenders used for logging and layout.
Configuration File Permissions
Listed below are the proper HDFS-related permissions and user/groups for folders and files for a working HDP cluster.
drwxr-xr-x 3 root root 4096 /etc/hadoop lrwxrwxrwx 1 hadoop_deploy hadoop 29 conf -> /etc/alternatives/hadoop-conf -rw-r--r-- 1 hdfs hadoop 2316 core-site.xml -rw-r--r-- 1 mapred hadoop 7632 mapred-site.xml -rw-r--r-- 1 mapred hadoop 7632 yarn-site.xml -rw-r--r-- 1 mapred hadoop 2033 mapred-queue-acls.xml -rw-r--r-- 1 hdfs hadoop 928 taskcontroller.cfg -rw-r--r-- 1 root root 9406 capacity-scheduler.xml -rw-r--r-- 1 root root 327 fair-scheduler.xml -rw-r--r-- 1 hdfs hadoop 4867 hadoop-env.sh -rw-r--r-- 1 hdfs hadoop 4867 yarn-env.sh