Chapter 19. Configuring Include File Management for HDFS and YARN
Both HDFS and YARN have the ability to control which hosts in the cluster should be included
and excluded from participating in the cluster. HDFS uses the dfs.hosts
,
and dfs.hosts.exclude
properties to control which set of datanodes are
allowed to connect to the NameNode. YARN uses user-definable files configured through the
yarn.resourcemanager.nodes.include-path
and
yarn.resourcemanager.nodes.exclude-path
properties to control which
nodes running the NodeManager component are allowed to communicate with the ResourceManager.
When the contents of these files are modified both the HDFS NameNode and YARN ResourceManager
need to be notified of these changes by invoking the -refreshNodes commands
through dfsadmin for HDFS and rmadmin for YARN.
You can configure Ambari to manage these include files for both YARN and HDFS. This feature can be enabled just for HDFS or YARN, or enabled for both services. When enabled, Ambari will manage the associated include/exclude files and update their contents based on the state of hosts within Ambari. When the files are changed, Ambari will also call the necessary refreshNodes commands to update the state of the NameNode and/or ResourceManager.
The table below describes the actions Ambari will take for the following operations:
Add Component: Adding a NodeManager, or DataNode
Delete Component: Removing a NodeManager, or DataNode
Decommission Component: Decommissioning a NodeManager, or DataNode
Recommission Component: Recommissioning a NodeManager, or DataNode
Operation | Include File Actions | Exclude File Actions | Refresh Nodes Call | Triggers Master Restart Indicator |
Add Component | Add hostname | Remove hostname | Yes | No |
Delete Component | Remove hostname | Remove hostname | No | No |
Decommission Component | Remove hostname (YARN only) | Add hostname | Yes | No |
Recommission Component | Add hostname (YARN only) | Remove hostname | Yes | No |
Enable Include File Management for HDFS
To enable Include File Management for HDFS:
In Ambari, add the
manage.include.files=true
property to the Advanced hdfs-site configuration section.Ensure that the dfs.hosts property is configured in the Custom hdfs-site and that it is set to a valid location on the filesystem of the HDFS NameNode. Ensure that the file exists and is owned by the user being used to run the NameNode.
Restart services as prompted by Ambari.
Example Configuration
In Advanced hdfs-site, set
manage.include.files=true
In Custom hdfs-site, set
dfs.hosts=/etc/hadoop/conf/dfs.include
Enable Include File Management for Yarn
To enable Include File Management for Yarn:
In Ambari, add the
manage.include.files=true
property to the Advanced yarn-site configuration section.Ensure that the
yarn.resourcemanager.nodes.include-path
is set to a valid location on the filesystem of the YARN Resource Manager. If theyarn.resourcemanager.nodes.include-path
is not set, add it to the Custom yarn-site configuration.Restart services as prompted by Ambari.
Example Configuration
In Advanced yarn-site, set
manage.include.files=true
In Custom yarn-site, set
yarn.resourcemanager.nodes.include-path=/etc/hadoop/conf/yarn.include
Disable Include File Management for HDFS
To disable Include File Management for HDFS:
In Ambari, set the
manage.include.files=false
property in the Custom hdfs-site configuration section.In the same configuration section, remove the
dfs.hosts
property if configured and you no longer want HDFS to use include files for host management.Restart services as prompted by Ambari.
Disable Include File Management for Yarn
To disable Include File Management for Yarn:
In Ambari, set the
manage.include.files=false
property in the Custom yarn-site configuration section.In the same configuration section, remove the
yarn.resourcemanager.nodes.include-path
property if configured and you no longer want YARN to use include files for host management.Restart services as prompted by Ambari.