Managing YARN

Adding the YARN Service

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

  1. On the Home page, click to the right of the cluster name and select Add a Service. A list of service types display. You can add one type of service at a time.
  2. Click the YARN (MR2 Included) radio button and click Continue.
  3. Select the radio button next to the services on which the new service should depend and click Continue.
  4. Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the hosts to determine the best hosts for each role. The wizard assigns all worker roles to the same set of hosts to which the HDFS DataNode role is assigned. These assignments are typically acceptable, but you can reassign them if necessary.

    Click a field below a role to display a dialog containing a list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts or Custom to display the pageable hosts dialog.

    The following shortcuts for specifying hostname patterns are supported:
    • Range of hostnames (without the domain portion)
      Range Definition Matching Hosts
      10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4
      host[1-3].company.com host1.company.com, host2.company.com, host3.company.com
      host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com
    • IP addresses
    • Rack name

    Click the View By Host button for an overview of the role assignment by hostname ranges.

Configuring Memory Settings for YARN and MRv2

The memory configuration for YARN and MRv2 memory is important to get the best performance from your cluster. Several different settings are involved. The table below shows the default settings, as well as the settings that Cloudera recommends, for each configuration option. See Managing MapReduce and YARN for more configuration specifics and, for detailed tuning advice with sample configurations, see Tuning YARN.
YARN and MRv2 Memory Configuration
Cloudera Manager Property Name CDH Property Name Default Configuration Cloudera Tuning Guidelines
Container Memory Minimum yarn.scheduler.minimum-allocation-mb 1 GB 0
Container Memory Maximum yarn.scheduler.maximum-allocation-mb 64 GB amount of memory on largest node
Container Memory Increment yarn.scheduler.increment-allocation-mb 512 MB Use a fairly large value, such as 128 MB
Container Memory yarn.nodemanager.resource.memory-mb 8 GB 8 GB
Map Task Memory mapreduce.map.memory.mb 1 GB 1 GB
Reduce Task Memory mapreduce.reduce.memory.mb 1 GB 1 GB
Map Task Java Opts Base mapreduce.map.java.opts -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Xmx768m
Reduce Task Java Opts Base mapreduce.reduce.java.opts -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Xmx768m
ApplicationMaster Memory yarn.app.mapreduce.am.resource.mb 1 GB 1 GB
ApplicationMaster Java Opts Base yarn.app.mapreduce.am.command-opts -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Xmx768m
       

Configuring Directories

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

Creating the Job History Directory

When adding the YARN service, the Add Service wizard automatically creates a job history directory. If you quit the Add Service wizard or it does not finish, you can create the directory outside the wizard:
  1. Go to the YARN service.
  2. Select Actions > Create Job History Dir.
  3. Click Create Job History Dir again to confirm.

Creating the NodeManager Remote Application Log Directory

When adding the YARN service, the Add Service wizard automatically creates a remote application log directory. If you quit the Add Service wizard or it does not finish, you can create the directory outside the wizard:
  1. Go to the YARN service.
  2. Select Actions > Create NodeManager Remote Application Log Directory.
  3. Click Create NodeManager Remote Application Log Directory again to confirm.

Importing MapReduce Configurations to YARN

Minimum Required Role: Cluster Administrator (also provided by Full Administrator)

When you upgrade from CDH 4 to CDH 5, you can import MapReduce configurations to YARN as part of the upgrade wizard. If you do not import configurations during upgrade, you can manually import the configurations at a later time:
  1. Go to the YARN service page.
  2. Stop the YARN service.
  3. Select Actions > Import MapReduce Configuration. The import wizard presents a warning letting you know that it will import your configuration, restart the YARN service and its dependent services, and update the client configuration.
  4. Click Continue to proceed. The next page indicates some additional configuration required by YARN.
  5. Verify or modify the configurations and click Continue. The Switch Cluster to MR2 step proceeds.
  6. When all steps have been completed, click Finish.
  7. (Optional) Remove the MapReduce service.
    1. Click the Home tab.
    2. In the MapReduce row, right-click and select Delete. Click Delete to confirm.
  8. Recompile JARs used in MapReduce applications. For further information, see For MapReduce Programmers: Writing and Running Jobs.

Configuring the YARN Scheduler

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

The YARN service is configured by default to use the FairScheduler. You can change the scheduler type to FIFO or Capacity Scheduler. You can also modify the Fair Scheduler and Capacity Scheduler configuration. For further information on schedulers, see Schedulers.

Configuring the Scheduler Type

  1. Go to the YARN service.
  2. Click the Configuration tab.
  3. Expand the ResourceManager Default Group and click the Scheduler Class property.
  4. Select a scheduler class.
  5. Click Save Changes to commit the changes.
  6. Restart the YARN service.

Modifying the Scheduler Configuration

  1. Go to the YARN service.
  2. Click the Configuration tab.
  3. Click the ResourceManager Default Group category.
  4. Click a property and modify the configuration.
  5. Click Save Changes to commit the changes.
  6. Restart the YARN service.

Dynamic Resource Management

In addition to the static resource management available to all services, the YARN service also supports dynamic management of its static allocation. See Dynamic Resource Pools.

Configuring YARN for Long-running Applications

On a secure cluster, long-running applications such as Spark Streaming jobs will need additional configuration since the default settings only allow the hdfs user's delegation tokens a maximum lifetime of 7 days, which is not always sufficient. For instructions on how to work around this issue, see Configuring YARN for Long-running Applications.