The YARN Service
CDH supports two versions of the MapReduce computation framework: MRv1 and MRv2, which are implemented by the MapReduce (MRv1) and YARN (MRv2) services.
Cloudera Manager provides a wizard to easily migrate MapReduce configurations to YARN. For further information on migrating from MapReduce to YARN, see Importing MapReduce Configurations to YARN and Migrating from MapReduce v1 (MRv1) to MapReduce v2 (MRv2, YARN).
- For production uses, Cloudera recommends that only one MapReduce framework should be running at any given time.
- For development clusters that have both MapReduce and YARN installed, ensure that the alternatives priorities (described below) are set appropriately and client configurations are deployed when switching between MapReduce and YARN to ensure that clients pick up the proper configuration.
Configuring Alternatives Priority
The alternatives priority property determines which service—MapReduce or YARN—is used by clients to run MapReduce jobs; the service with a higher value of the property is used. In CDH 4, the MapReduce service alternatives priority is set to 92 and the YARN service is set to 91. In CDH 5, the values are reversed; the MapReduce service alternatives priority is set to 91 and the YARN service is set to 92.
- Go to the MapReduce or YARN service.
- Click the Configuration tab.
- Expand the Gateway Default Group node.
- In the Alternatives Priority property, set the priority value.
- Click Save Changes.
- Redeploy the client configuration.
Adding the YARN Service
- On the Home page, click to the right of the cluster name and select Add a Service. A list of service types display. You can add one type of service at a time.
- Click the YARN (MR2 Included) radio button and click Continue.
- Select the radio button next to the services on which the new service should depend and click Continue.
- Customize the assignment of role instances to hosts.
The wizard evaluates the hardware configurations of the hosts to determine
the best hosts for each role. The wizard assigns all worker roles to the
same set of hosts to which the HDFS DataNode role is assigned. These
assignments are typically acceptable, but you can reassign role instances to
hosts of your choosing, if desired.
Click a field below a role to display a dialog containing a pageable list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts or Custom to display the pageable hosts dialog.
The following shortcuts for specifying hostname patterns are supported:- Range of hostnames (without the domain
portion)
Range Definition Matching Hosts 10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4 host[1-3].company.com host1.company.com, host2.company.com, host3.company.com host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com - IP addresses
- Rack name
Click the View By Host button for an overview of the role assignment by hostname ranges.
- Range of hostnames (without the domain
portion)
Configuring Directories
Creating the Job History Directory
- Go to the YARN service.
- Select .
- Click Create Job History Dir again to confirm.
Creating the NodeManager Remote Application Log Directory
- Go to the YARN service.
- Select .
- Click Create NodeManager Remote Application Log Directory again to confirm.
Importing MapReduce Configurations to YARN
- Go to the YARN service page.
- Stop the YARN service if it is running.
- Select The import wizard presents a warning letting you know that it will import your configuration, restart the YARN service and its dependent services, and update the client configuration. .
- Click Continue to proceed.
- The next page indicates some additional configuration required by YARN. Verify or modify these and click Continue.
- The Switch Cluster to MR2 step proceeds. When all steps have been completed, click Continue.
- Configures services to use YARN as the MapReduce computation framework instead of MapReduce.
- Overwrites existing YARN configuration and role assignments.
Dynamic Resource Management
In addition to the static resource management available to all services, the YARN service also supports dynamic management of its static allocation. See Dynamic Resource Pools.
Configuring YARN High Availability
You can use Cloudera Manager to configure CDH 5 or later for ResourceManager High Availability (HA). A ResourceManager HA cluster is configured with an active and a standby ResourceManager. Only one ResourceManager can be active at any point in time.
Cloudera Manager supports automatic failover of the ResourceManager. It does not provide a mechanism to manually force a failover through the Cloudera Manager user interface.
ResourceManager HA requires ZooKeeper and HDFS services to be running.
For more information, see the Configuring High Availability for ResourceManager in the CDH High Availability Guide.
Enabling High Availability
- Go to the YARN service.
- Select . A screen showing the hosts that are eligible to run a standby ResourceManager displays. The host where the current ResourceManager is running is not available as a choice.
- Select the host where you want the standby ResourceManager to be installed, and click Continue. Cloudera Manager proceeds to execute the set of commands that stop the YARN service, add a standby ResourceManager, initialize the ResourceManager High Availability state in ZooKeeper, restart YARN, and redeploy the relevant client configurations.
- Go to the YARN service.
- Click the Configuration tab.
- Expand the JobHistory Server Default Group.
- Select the Advanced subcategory.
- Check the Automatically Restart Process checkbox.
- Restart the JobHistory Server role.
Disabling High Availability
- Go to the YARN service.
- Select . A screen showing the hosts running the ResourceManagers displays.
- Select which ResourceManager (host) you want to remain as the single ResourceManager, and click Continue. Cloudera Manager executes a set of commands that stop the YARN service, remove the standby ResourceManager and the Failover Controller, restart the YARN service, and redeploy client configurations.
<< The Sqoop 2 Service | The ZooKeeper Service >> | |