Why one scheduler?

Cloudera Data Platform (CDP) only supports the Capacity Scheduler in the YARN clusters.

Prior to the launch of CDP, Cloudera customers used one of the two schedulers (Fair Scheduler and Capacity Scheduler) depending on which product they were using (CDH or HDP respectively).

The choice to converge to one scheduler in CDP was a hard one but ultimately rooted in our intention to reduce complexity for our customers and at the same time help focus our future investments. Over the years, both the schedulers have evolved greatly, to the point that Fair Scheduler borrowed almost all of the features from Capacity Scheduler and vice-versa. Given this, we ultimately decided to put our weight behind Capacity Scheduler for all your YARN clusters.

Those clusters that currently use the Fair scheduler must migrate to the Capacity Scheduler when moving to CDP. Cloudera provides tools, documentation, and related help for such migrations.

Benefits of Using Capacity Scheduler

The following are some of the benefits when using Capacity Scheduler:

  • Integration with Ranger
  • Node partitioning/labeling
  • Improvements to schedule on cloud-native environments, such as better bin-packing, autoscaling support, and so on.
  • Scheduling throughput improvements
    • Global Scheduling Framework
    • Lookup of multiple nodes at one time

Fore more details about Scheduling throughput improvements, see Scheduler Performance Improvements.

  • Affinity/anti-affinity: run application X only on those nodes which run application Y and the other way around. Do not run application X and application Y on the same node.

For information about the currently supported features, see Supported Features.