Increasing the number of preferred nodes for caching files

Query processing in Trino is distributed into different stages where tasks are processed by different nodes in the cluster. You can limit the number of hosts that are required to process these tasks with the fs.cache.preferred-hosts-count property.

If you are using the Hive or Iceberg connector and if the workload from these tables are high, then a high throughput is required, which can be achieved by increasing the value of the fs.cache.preferred-hosts-count connector property in the connector configuration details.

The value must be increased as concurrency increases. The default value of 2 may not be sufficient if you are running a high workload. For example, if you are running 40 nodes with a concurrency of 5, consider increasing the value of fs.cache.preferred-hosts-count to 20. You can increase this value up to a maximum limit of 40 (maximum worker nodes allowed for caching files).

Note that the value depends on the workload that you are trying to query through the connector and on the number of parallel queries that you expect to run simultaneously. Therefore, tune the parameter according to your workload needs.