YARN Resource Management
Also available as:
PDF
loading table of contents...

Setting Application Limits

To avoid system-thrash due to an unmanageable load -- caused either by malicious users, or by accident -- the Capacity Scheduler enables you to place a static, configurable limit on the total number of concurrently active (both running and pending) applications at any one time. The maximum-applications configuration property is used to set this limit, with a default value of 10,000:

Property: yarn.scheduler.capacity.maximum-applications

Value: 10000

The limit for running applications in any specific queue is a fraction of this total limit, proportional to its capacity. This is a hard limit, which means that once this limit is reached for a queue, any new applications to that queue will be rejected, and clients will have to wait and retry later. This limit can be explicitly overridden on a per-queue basis with the following configuration property:

Property: yarn.scheduler.capacity.<queue-path>.maximum-applications

Value: absolute-capacity * yarn.scheduler.capacity.maximum-applications

There is another resource limit that can be used to set a maximum percentage of cluster resources allocated specifically to ApplicationMasters. The maximum-am-resource-percent property has a default value of 10%, and exists to avoid cross-application deadlocks where significant resources in the cluster are occupied entirely by the Containers running ApplicationMasters. This property also indirectly controls the number of concurrent running applications in the cluster, with each queue limited to a number of running applications proportional to its capacity.

Property: yarn.scheduler.capacity.maximum-am-resource-percent

Value: 0.1

As with maximum-applications, this limit can also be overridden on a per-queue basis:

Property: yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent

Value: 0.1

All of these limits ensure that no single application, user, or queue can cause catastrophic failure, or monopolize the cluster and cause excessive degradation of cluster performance.