The MapReduce Service
CDH supports two versions of the MapReduce computation framework: MRv1 and MRv2, which are implemented by the MapReduce (MRv1) and YARN (MRv2) services.
Cloudera Manager provides a wizard to easily migrate MapReduce configurations to YARN. For further information on migrating from MapReduce to YARN, see Importing MapReduce Configurations to YARN and Migrating from MapReduce v1 (MRv1) to MapReduce v2 (MRv2, YARN).
- For production uses, Cloudera recommends that only one MapReduce framework should be running at any given time.
- For development clusters that have both MapReduce and YARN installed, ensure that the alternatives priorities (described below) are set appropriately and client configurations are deployed when switching between MapReduce and YARN to ensure that clients pick up the proper configuration.
The MapReduce service supports the following tasks:
Configuring Alternatives Priority
The alternatives priority property determines which service—MapReduce or YARN—is used by clients to run MapReduce jobs; the service with a higher value of the property is used. In CDH 4, the MapReduce service alternatives priority is set to 92 and the YARN service is set to 91. In CDH 5, the values are reversed; the MapReduce service alternatives priority is set to 91 and the YARN service is set to 92.
- Go to the MapReduce or YARN service.
- Click the Configuration tab.
- Expand the Gateway Default Group node.
- In the Alternatives Priority property, set the priority value.
- Click Save Changes.
- Redeploy the client configuration.
Configuring the MapReduce Scheduler
The MapReduce service is configured by default to use the FairScheduler. You can change scheduler type to FIFO or Capacity Scheduler. You can also modify the Fair Scheduler and Capacity Scheduler configuration. For more information about these schedulers, see Fair Scheduler or Capacity Scheduler.
Setting the Task Scheduler Type
- Go to the MapReduce service.
- Click the Configuration tab.
- Expand the JobTracker Default Group category and click the Classes category.
- Click the Value field of the Task Scheduler row. Select the scheduler you want to use and then click Save Changes.
- Restart the JobTracker to have the new configuration take effect:
- Click the Instances tab.
- Click the JobTracker role.
- Select .
Modifying the Configuration of the Fair and Capacity Schedulers
- Go to the MapReduce service.
- Click the Configuration tab.
- Click the Jobs subcategory of the JobTracker Default Group category.
- Make your changes as appropriate and click Save Changes.
- Click the refresh icon to have the new configuration take effect.
Configuring the MapReduce Service to Save Job History
Normally job history is saved on the host on which the JobTracker is running. You can configure JobTracker to write information about every job that completes to a specified HDFS location. By default, the information is retained for 7 days.
Enabling Map Reduce Job History To Be Saved To HDFS
- Create a folder in HDFS to contain the history information. When creating the folder in HDFS, set the owner and group to mapred:hadoop with permission setting 775.
- Go to the MapReduce service.
- Click the Configuration tab.
- Expand the JobTracker Default Group category and click the Paths subcategory.
- Set the Completed Job History Location property to the location that you created in step 1.
- Click Save Changes.
- Restart the MapReduce service.
Setting the Job History Retention Duration
- Select the JobTracker Default Group category.
- Set the Job History Files Maximum Age property (mapreduce.jobhistory.max-age-ms to the length of time (in milliseconds, seconds, minutes, or hours) that you want job history files to be kept.
- Restart the MapReduce service.
- Select the JobTracker Default Group category.
- Set the Job History Files Cleaner Interval property (mapreduce.jobhistory.cleaner.interval) to the desired frequency (in milliseconds, seconds, minutes, or hours).
- Restart the MapReduce service.
JobTracker High Availability
You can use Cloudera Manager to configure CDH 4.3 or later for JobTracker High Availability (HA). Although it is possible to configure JobTracker HA with CDH 4.2, it is not recommended. Rolling restart, decommissioning of TaskTrackers, and rolling upgrade of MapReduce from CDH 4.2 to CDH 4.3 are not supported when JobTracker HA is enabled.
A JobTracker HA cluster is configured with an active and a standby JobTracker. Only one JobTracker can be active at any point in time.
Cloudera Manager supports automatic failover of the JobTracker. It does not provide a mechanism to manually force a failover through the Cloudera Manager user interface.
For more information, see Configuring High Availability for the JobTracker (CDH 4) or Configuring High Availability for the JobTracker (CDH 5) in the CDH High Availability Guide.
Enabling JobTracker High Availability
- Go to the MapReduce service.
- Select . A screen showing the hosts that are eligible to run a standby JobTracker displays. The host where the current JobTracker is running is not available as a choice.
- Select the host where you want the Standby JobTracker to be installed, and click Continue.
- Enter a directory location on the local filesystem for each JobTracker host. These directories will be used to store job configuration data.
- You may enter more than one directory, though it is not required. The paths do not need to be the same on both JobTracker hosts.
- If the directories you specify do not exist, they will be created with the appropriate permissions. If they already exist, they must be empty and have the appropriate permissions.
- If the directories are not empty, Cloudera Manager will not delete the contents.
- Optionally use the checkbox under Advanced Options to force initialize the ZooKeeper znode for auto-failover.
- Click Continue. Cloudera Manager executes a set of commands that stop the MapReduce service, add a standby JobTracker and Failover controller, initialize the JobTracker High Availability state in ZooKeeper, create the job status directory, restart MapReduce, and redeploy the relevant client configurations.
Disabling JobTracker High Availability
- Go to the MapReduce service.
- Select . A screen showing the hosts running the JobTrackers displays.
- Select which JobTracker (host) you want to remain as the single JobTracker, and click Continue. Cloudera Manager executes a set of commands that stop the MapReduce service, remove the standby JobTracker and the Failover Controller, restart the MapReduce service, and redeploy client configurations.
Configuring Client Overrides
A configuration property qualified with (Client Override) is a server-side setting that ignores whatever value a client might try to set for that property. It performs the same role as its unqualified counterpart, and applies the configuration to the service with the setting <final>true</final>.
For example, if you set the Map task heap property to 1 GB in the job configuration code and the service's heap property qualified with (Client Override) is set to 500 MB, then 500 MB is applied despite what you've asked the job to use.
<< The Key-Value Store Indexer Service | The Oozie Service >> | |