Spark resource recommendations
This section provides the Spark value recommendations as prescriptive configuration values for optimizing resource allocation in Spark applications. This helps in improving resource efficiency by providing actionable suggestions and recommending optimal values for Spark configuration settings.
This feature provides prescriptive suggestions for configuring spark.executor.memory based on the analysis of allocated memory and peak memory usage from a single application run. By offering optimized memory recommendations, this feature helps improve resource allocation and application performance. However, recommendations are not provided for Spark applications utilizing complex memory configurations, specifically those with explicitly defined values for spark.memory.offHeap.size, spark.executor.pyspark.memory, or spark.executor.memoryOverhead.
Memory Optimization recommendations for Spark executors
Memory usage optimization helps maintain optimal performance in your Spark workloads on Cloudera clusters. These recommendations ensure that your Spark executors utilize memory efficiently, balancing performance with resource allocation. This feature addresses two main scenarios: Underutilization (memory usage below 60%) and Overutilization (peak memory usage above 90%).
Underutilization scenario
If the memory utilization of a Spark executor is less than 60%, then decreasing the executor memory will be recommended. The new recommended value will be peak memory usage plus a 25% buffer.
The recommended values are rounded up to a full GB value. For example, if the currently configured value is 10 GB and the peak usage is 5 GB, then by adding a 25% buffer, the computed value results in 6.25 GB, which will be rounded up to 7 GB.
Recommendations are not provided under the following conditions:
- The Total Garbage Collection (GC) duration across all tasks is over 10 minutes.
- The Total shuffle data spills across all tasks and exceeds 500MB.
- Advanced memory configurations have been explicitly set for the following properties:
- spark.memory.offHeap.size
- spark.executor.pyspark.memory
- spark.executor.memoryOverhead
- The calculated recommended memory is less than 1 GB.
- When calculated recommended value is the same as the already configured value: For example, if 2 GB is allocated and 1 GB is used, adding 25% to the used amount results in 1.25 GB. Rounding this value up to the nearest whole number results in 2 GB. Because the calculated value equals the current allocation, no recommendation is displayed.
Overutilization Scenario
In the case of Overutilization (peak memory usage greater than 90%), the executor is experiencing memory pressure, which can lead to performance degradation or out-of-memory errors. The recommendation is to increase the executor memory by 1 GB. This small increase aims to alleviate the memory pressure. To prevent runaway resource allocation, the maximum recommended increase for the executor memory is capped at 32 GB.
Limitations of the recommendation system
When applying configuration changes based on the recommendation system, you must monitor the subsequent runs of your applications. Based on the recommendations, you must know the limitations when you apply the suggested changes.
- Seasonal trends may affect resource allocation
- The volume of data processed by Spark jobs may exhibit seasonal variability. Consequently, if the system provides an exact resource value recommendation based solely on utilization during a low-traffic period, the subsequent peak season may result in potential issues, specifically failure (due to out-of-memory errors) or performance degradation (due to under-allocation of resources during high-traffic periods).
- Cluster-Level Resource Impact
- When you increase resource allocation for a specific Spark application, this action can adversely impact other workloads currently running on the cluster. You must consider the overall cluster health and resource availability before implementing recommendations that involve increased allocation.
- Anomaly in a single application run
- The recommendation is based on the resource consumption of the application's current run, not its past runs. The current run may be a one-off anomaly, consuming either higher or lower resources compared to a regular execution of the same application. Therefore, monitoring subsequent runs is critical to confirming the stability and efficacy of the recommended configuration.
- Possibility of performance degradation
- For certain applications, reducing the allocated memory might lead to a slowdown in execution. For example, a reduction in memory could result in increased shuffle spill, which degrades performance. You must verify that a reduction in resources does not introduce an unacceptable latency or slowness.
- Jobs older than 90 days
- If the jobs are older than 90 days, resource recommendations are not provided.
