Run a tablet rebalancing tool on a rack-aware cluster

It is possible to use the kudu cluster rebalance tool to establish the placement policy on a cluster. This might be necessary when the rack awareness feature is first configured or when re-replication violated the placement policy.

The rebalancing tool breaks its work into three phases:
  1. The rack-aware rebalancer tries to establish the placement policy. Use the ‑‑disable_policy_fixer flag to skip this phase.
  2. The rebalancer tries to balance load by location, moving tablet replicas between locations in an attempt to spread tablet replicas among locations evenly. The load of a location is measured as the total number of replicas in the location divided by the number of tablet servers in the location. Use the ‑‑disable_cross_location_rebalancing flag to skip this phase.
  3. The rebalancer tries to balance the tablet replica distribution within each location, as if the location were a cluster on its own. Use the ‑‑disable_intra_location_rebalancing flag to skip this phase.

By using the ‑‑report_only flag, it’s also possible to check if all tablets in the cluster conform to the placement policy without attempting any replica movement.

Example of rebalancing tool with rack awareness functionality

The behavior of each of these flags is explained through the following examples.

Consider three locations and three tablets (a total of 9 replicas) while running the tool with the flags below (placement policy fixer):
--disable_cross_location_rebalancing --disable_intra_location_rebalancing

Before running the tool:

Location A Location B Location C
Replica X Replica Y Replica Z
Replica X Replica Y Replica Z
Replica X Replica Y Replica Z

After running the tool with the flags:

Location A Location B Location C
Replica X Replica X Replica X
Replica Y Replica Y Replica Y
Replica Z Replica Z Replica Z

The replicas of every tablet are now distributed across all three locations.

Consider the following tablet distribution and run the tool with the flags below (cross-location rebalancing):
--disable_policy_fixer --disable_intra_location_rebalancing

Before running the tool:

Location Number of replicas across all tables in the location
A (5 tablet servers) 15
B (5 tablet servers) 18
C (5 tablet servers) 21

After running the tool with the flags:

Location Number of replicas across all tables in the location
A (5 tablet servers) 21
B (5 tablet servers) 21
C (5 tablet servers) 21

The number of replicas in each of the locations is now equal.

Let’s analyze Location A before and after using the tool with the flags (intra-location rebalancing):

--disable_policy_fixer --disable_cross_location_rebalancing

Before running the tool:

Tablet server (TS) Number of replicas across all tables in the server
TS_1 3
TS_2 5
TS_3 8
TS_4 4
TS_5 1

After running the tool with the flags:

Tablet server (TS) Number of replicas across all tables in the server
TS_1 4
TS_2 5
TS_3 4
TS_4 4
TS_5 4

The number of replicas in each tablet server is now balanced.