Run the tablet rebalancing tool
kudu CLI contains a rebalancing
tool that can be used to rebalance tablet replicas among tablet servers. For each table, the
tool attempts to balance the number of replicas per tablet server. It also, without unbalancing
any table, attempts to even out the number of replicas per tablet server across the cluster as a
The rebalancing tool should be run as the Kudu admin user, specifying all master addresses:
sudo -u kudu kudu cluster rebalance master-01.example.com,master-02.example.com,master-03.example.com
When run, the rebalancer reports on the initial tablet replica distribution in the cluster, logs the replicas it moves, and prints a final summary of the distribution when it terminates:
Per-server replica distribution summary: Statistic | Value -----------------------+----------- Minimum Replica Count | 0 Maximum Replica Count | 24 Average Replica Count | 14.400000 Per-table replica distribution summary: Replica Skew | Value --------------+---------- Minimum | 8 Maximum | 8 Average | 8.000000 I0613 14:18:49.905897 3002065792 rebalancer.cc:779] tablet e7ee9ade95b342a7a94649b7862b345d: 206a51de1486402bbb214b5ce97a633c -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move scheduled I0613 14:18:49.917578 3002065792 rebalancer.cc:779] tablet 5f03944529f44626a0d6ec8b1edc566e: 6e64c4165b864cbab0e67ccd82091d60 -> ba8c22ab030346b4baa289d6d11d0809 move scheduled I0613 14:18:49.928683 3002065792 rebalancer.cc:779] tablet 9373fee3bfe74cec9054737371a3b15d: fab382adf72c480984c6cc868fdd5f0e -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move scheduled ... (full output elided) I0613 14:19:01.162802 3002065792 rebalancer.cc:842] tablet f4c046f18b174cc2974c65ac0bf52767: 206a51de1486402bbb214b5ce97a633c -> 3b4d9266ac8c45ff9a5d4d7c3e1cb326 move completed: OK rebalancing is complete: cluster is balanced (moved 28 replicas) Per-server replica distribution summary: Statistic | Value -----------------------+----------- Minimum Replica Count | 14 Maximum Replica Count | 15 Average Replica Count | 14.400000 Per-table replica distribution summary: Replica Skew | Value --------------+---------- Minimum | 1 Maximum | 1 Average | 1.000000
If more details are needed in addition to the replica distribution
summary, use the
--output_replica_distribution_details flag. If added, the flag makes the tool
print per-table and per-tablet server replica distribution statistics as well.
flag to get a report on table-wide and cluster-wide replica distribution statistics without
starting any rebalancing activity.
The rebalancer can also be restricted to run on a subset of the tables by
--tables flag. Note that, when
running on a subset of tables, the tool does not attempt to balance the cluster as a whole.
The length of time rebalancing is run for can be controlled with the flag
--max_run_time_sec. By default, the
rebalancer runs until the cluster is balanced. To control the amount of resources devoted to
rebalancing, modify the flag
rebalance --help for more.
It is safe to stop the rebalancer tool at any time. When restarted, the rebalancer continues rebalancing the cluster.
The rebalancer tool requires all registered tablet servers to be up and
running to proceed with the rebalancing process in order to avoid possible conflicts and races
with the automatic re-replication and to keep replica placement optimal for current
configuration of the cluster. If a tablet server becomes unavailable during the rebalancing
session, the rebalancer exits. As noted above, it is safe to restart the rebalancer after
resolving the issue with unavailable tablet servers. However, if it is necessary to rebalance
the cluster when a few tablet servers in a Kudu cluster are not available, it is possible to
specify their UUIDs as a comma-separated list with the
flag and rebalance the rest of the cluster. With the
the specified tablet servers are effectively ignored by the rebalancer tool, which means they
are not considered as a part of the cluster along with tablet replicas they host.
The rebalancing tool can rebalance Kudu clusters running older versions as well, with some restrictions. Consult the following table for more information. In the table, "RF" stands for "replication factor".
|Version Range||Rebalances RF = 1 Tables?||Rebalances RF > 1 Tables?|
|v < 1.4.0||No||No|
|1.4.0 <= v < 1.7.1||No||Yes|
|v >= 1.7.1||Yes||Yes|
If the rebalancer is running against a cluster where rebalancing replication factor one tables are not supported, it rebalances all the other tables and the cluster as if those singly-replicated tables did not exist.
To perform cluster-wide rebalancing, it is recommended to run the
rebalance tool. In addition to the cluster-wide rebalancing, you can run the
range-aware rebalancing for larger tables in the cluster that are multilevel-partitioned,
and can suffer from the hot-spotting issue.