Configurations for throttling of tasks

Learn about different parameters that you can use to control the Replication Manager's (RM) throttling of tasks.

Total number of replication commands that can be queued on a datanode. The limit is made up of (number_of_replication_commands + reconstruction_weight * number_of_reconstruction_commands). The default value is 20.
The weight to apply to multiple reconstruction commands before adding to the datanode.replication.limit. The default value is 3.
The total number of delete container commands to be queued on a given datanode. The default value is 40.
The overall replication task limit on a cluster is the number of healthy nodes multiplied by the datanode.replication.limit. The hdds.scm.replication.inflight.limit.factor configuration, which should be between 0 and 1, scales that limit down to reduce the overall number of replicas pending creation on the cluster. A setting of 0 disables global limit checking. A setting of 1 effectively disables it by making the limit equal to the above equation. However, if there are many decommissioning nodes on the cluster, the decommissioning nodes will have a higher than normal limit, so the setting of 1 might still provide some limit in extreme circumstances. The default value is 0.75.
When a datanode is decommissioning its replication thread pool, the hdds.datanode.replication.streams.limit configuration, whose default value is 10, is multiplied by this factor to allocate more threads for replication. On SCM, the limit for any datanode which is not in-service (for example, decommissioning or entering maintenance) is also increased by the same factor. This allows the node to dedicate more resources to replication as it will not be used for writes and will be reduced in priority for reads. The default value is 2.0.
The maximum number of simultaneous replication related commands that can run on a single datanode at a time, either by pushing data to a new target, or by coordinating an EC container reconstruction. The default value is 10.
The amount of time SCM allows for a task scheduled on a datanode to complete. After this duration, the datanode discards the command and SCM assumes that it is lost and schedules another if still relevant. The throttling applied by SCM when scheduling commands should prevent too many commands from being scheduled that can be completed in this interval. The default value is 300 seconds.

Global replication command limit

You can configure a global replication limit by limiting the number of inflight containers pending creation.

The global replication limit is defined by the hdds.scm.replication.inflight.limit.factor configuration. The default value is 0.75.

Global delete limit

Container replica deletes tend to be targeted to a single node, and the datanode already has a thread pool to handle them, which limits the number of deletes running concurrently. There is also no network impact when deleting a container. Therefore, there is no global command limit for delete commands.

Current replication state

To get a view of the overall state of the cluster, the ozone admin container report command can be run by an administrator. It returns details from the last RM run, indicating the number of containers with various problem states. This report is cached by each run of the RM check thread. Hence, it can be up to 5 minutes in stale mode.

In addition, the report values are exposed through metrics on the leader SCM process, and various other metrics detailing the current under and over replication queue sizes, number of inflight commands and many other details that can give insight to the throttling and completion of commands on the cluster.