Tuning Apache SparkPDF version

Resource Tuning Example

Consider a cluster with six hosts running NodeManagers, each equipped with 16 cores and 64 GB of memory.

The NodeManager capacities, yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores, should be set to 63 * 1024 = 64512 (megabytes) and 15, respectively. Avoid allocating 100% of the resources to YARN containers because the host needs some resources to run the OS and Hadoop daemons. In this case, leave one GB and one core for these system processes. Cloudera Manager accounts for these and configures these YARN properties automatically.

You might consider using --num-executors 6 --executor-cores 15 --executor-memory 63G. However, this approach does not work:

  • 63 GB plus the executor memory overhead does not fit within the 63 GB capacity of the NodeManagers.
  • The ApplicationMaster uses a core on one of the hosts, so there is no room for a 15-core executor on that host.
  • 15 cores per executor can lead to bad HDFS I/O throughput.

Instead, use --num-executors 17 --executor-cores 5 --executor-memory 19G:

  • This results in three executors on all hosts except for the one with the ApplicationMaster, which has two executors.
  • --executor-memory is computed as (63/3 executors per host) = 21. 21 * 0.07 = 1.47. 21 - 1.47 ~ 19.