Components of stochastic load balancer

Learn about the important cost functions used by the stochastic load balancer, certain important configuration parameters affecting these cost functions, and the generic configuration parameters that determine the way the balancer calculates the target cost of the cluster.

The following are the key cost functions that affect stochastic load balancer. In addition to this, certain configuration parameters that affect the balancer are described as part of the cost functions.

Region count skewness cost

This cost function calculates the cost of the potential cluster state based on the skewness in the number of regions on a cluster. This function returns a value between 0 and 1. A lower value indicates that the regions are evenly balanced across the cluster while a larger value indicates that the regions are unevenly distributed across the cluster.

The hbase.master.balancer.stochastic.regionCountCost configuration parameter defines the value of the multiplier that defines the impact this function makes on the overall target cost of the balancer. The default value of this configuration parameter is 500.

Primary region skewness cost

This cost function calculates the cost of the potential cluster state based on the skewness in the number of primary regions on a cluster. A lower cost returned by this function indicates that the primary regions are evenly distributed across the cluster while a larger value indicates that the regions are unevenly distributed across the cluster. This cost function uses the hbase.master.balancer.stochastic.primaryRegionCountCost configuration to calculate the cost.

The hbase.master.balancer.stochastic.primaryRegionCountCost configuration parameter defines the value of the multiplier that determines the impact this function makes on the overall target cost of the balancer. The default value of this configuration is 500.

Table skewness cost

This function calculates the cost of a potential cluster configuration based on how evenly distributed tables are across the cluster.

The hbase.master.balancer.stochastic.tableSkewCost configuration parameter defines how the function calculates the target table skewness cost in the cluster and also specifies the value of the multiplier that determines the impact this function makes on the overall target cost of the balancer. The default value of this configuration is 35.

Move cost

Given the starting state of the regions and a potential ending state, this function computes the cost based on the number of regions that have moved.

The following configuration parameters control the way this cost function calculates the cost.

  • hbase.master.balancer.stochastic.moveCost

    This is the multiplier for the MoveCostFunction in stochastic load balancer which defines the weightage given to this function while calculating the total cost of the balancer. The default value of this parameter is 7.

  • hbase.master.balancer.stochastic.moveCost.offpeak

    This is the value of the multiplier for this cost function during off-peak hours. The default value of this parameter is 3.

  • hbase.master.balancer.stochastic.maxMovePercent

    This parameter restricts the number of regions that are moved as part of the balancer run. The default value of this configuration is 1.0.

Locality based cost

The locality-based cost functions compute the cost of a potential cluster configuration based on where the HBase store files are located. The stochastic load balancer uses two types of locality functions.

  • Server Locality Cost

    This function computes the cost of a potential cluster state based on the location of store files respective to the servers. The more the number of stored files located on the server, the lower the cost and vice-versa.

    The hbase.master.balancer.stochastic.localityCost configuration parameter defines the multiplier value for the server locality cost function with the default value of 25.

  • Rack Locality Cost

    This function is an extension of the locality cost and computes the cost of the potential cluster configuration based on how the store files are located to the configured disk racks.

    The hbase.master.balancer.stochastic.rackLocalityCost configuration parameter defines the multiplier for the rack locality cost function and has the default value of 15.

Read request cost

This function computes the cost of the total number of read requests. A higher computed cost indicates an unbalanced cluster. This uses a rolling average of region load.

The hbase.master.balancer.stochastic.readRequestCost configuration parameter defines the multiplier for the read request cost function with the default value of 5.

Write request cost

Computes the cost of the total number of write requests. A higher computed cost indicates an unbalanced cluster.

The hbase.master.balancer.stochastic.writeRequestCost configuration parameter defines the multiplier for the write request cost function and has the default value of 5.

Memstore size cost

Computes the cost of total memory store size across the cluster. A higher cost indicates an unbalanced cluster. This uses the rolling average of the statistics received from the region servers.

The hbase.master.balancer.stochastic.memstoreSizeCost configuration parameter defines the multiplier for the memstore size-based cost function. The default value of this parameter is 5.

Storefile cost

This cost function computes the cost of total open-store file sizes. A higher computed cost indicates an unbalanced cluster.

The hbase.master.balancer.stochastic.storefileSizeCost configuration parameter defines the multiplier for the store file size-based cost function. The default value of this parameter is 5.