The Capacity Scheduler is designed to allow organizations to share compute clusters using the very familiar notion of FIFO (first-in, first-out) queues. YARN does not assign entire nodes to queues. Queues own a fraction of the capacity of the cluster, and this specified q ueue capacity can be fulfilled from any number of nodes in a dynamic fashion.
Scheduling is the process of matching resource requirements -- of multiple applications from various users, and submitted to different queues at multiple levels in the queue hierarchy -- with the free capacity available on the nodes in the cluster. Because total cluster capacity can vary, capacity configuration values are expressed as percents.
The capacity
property can be used by administrators to allocate
a percentage of cluster capacity to a queue. The following properties would divide the
cluster resources between the Engineering, Support, and Marketing organizations in a
6:1:3 ratio (60%, 10%, and 30%).
Property: yarn.scheduler.capacity.root.engineering.capacity
Value: 60
Property: yarn.scheduler.capacity.root.support.capacity
Value: 10
Property: yarn.scheduler.capacity.root.marketing.capacity
Value: 30
Now suppose that the Engineering group decides to split its capacity between the Development and QA sub-teams in a 1:4 ratio. That would be implemented by setting the following properties:
Property:yarn.scheduler.capacity.root.engineering.development.capacity
Value: 20
Property: yarn.scheduler.capacity.root.engineering.qa.capacity
Value: 80
Note | |
---|---|
The sum of capacities at any level in the hierarchy must equal 100%. Also, the capacity of an individual queue at any level in the hierarchy must be 1% or more (you cannot set a capacity to a value of 0). |
The following image illustrates this cluster capacity configuration: