Cloudera Data Engineering Performance ManagementPDF version

Cloudera Data Engineering performance tuning

Using default instances without local SSD does impact performance. Instances with local SSDs generally provide faster performance, but you need to ensure you choose instance types that have large enough local storage for intermediate results.

Make sure there is reasonable headroom (at least 20%) between the per executor memory and core usage and what the instance profile provides.

By default, Cloudera Data Engineering collects metrics around CPU, memory, and I/O. This can have a negative impact on performance - especially for long running jobs that take several hours to complete. Profiling metrics are useful in development and testing, but an unnecessary overhead in a production setting.

You can turn this feature off using the Spark Analysis toggle on the job Configuration tab.

Spark 3 introduced performance improvements over Spark 2, therefore Cloudera recommends using it whenever possible.