Increasing the number of preferred nodes for caching files
Query processing in Trino is distributed into different stages where tasks are processed
by different nodes in the cluster. You can limit the number of hosts that are required to process
these tasks with the fs.cache.preferred-hosts-count property.
If you are using the Hive or Iceberg connector and if the workload from these tables are high,
then a high throughput is required, which can be achieved by increasing the value of the
fs.cache.preferred-hosts-count connector property in the connector
configuration details.
The value must be increased as concurrency increases. The default value of 2 may not be
sufficient if you are running a high workload. For example, if you are running 40 nodes with a
concurrency of 5, consider increasing the value of
fs.cache.preferred-hosts-count to 20. You can increase this value up to a
maximum limit of 40 (maximum worker nodes allowed for caching files).
Note that the value depends on the workload that you are trying to query through the connector and on the number of parallel queries that you expect to run simultaneously. Therefore, tune the parameter according to your workload needs.
