During scheduling, queues at any level in the hierarchy are sorted in the order of their current used capacity, and available resources are distributed among them starting with queues that are currently the most under-served. With respect to capacities alone, the resource scheduling has the following workflow:
The more under-served a queue is, the higher the priority it receives during resource allocation. The most under-served queue is the queue with the least ratio of used capacity as compared to the total cluster capacity.
The used capacity of any parent queue is defined as the aggregate sum of used capacity of all of its descendant queues, recursively.
The used capacity of a leaf queue is the amount of resources used by the allocated Containers of all of the applications running in that queue.
Once it is decided to give a parent queue the currently available free resources, further scheduling is done recursively to determine which child queue gets to use the resources -- based on the previously described concept of used capacities.
Further scheduling happens inside each leaf queue to allocate resources to applications in a FIFO order.
This is also dependent on locality, user level limits, and application limits.
Once an application within a leaf queue is chosen, scheduling also happens within the application. Applications may have different priorities of resource requests.
To ensure elasticity, capacity that is configured but not utilized by any queue due to lack of demand is automatically assigned to the queues that are in need of resources.