Behavioral issues for the Cloudera Data Services on premises 1.5.5 SP1

Behavioral changes denote a marked change in behavior from the previously released version to this version of Cloudera Data Services on premises release.

Summary:

Implement Longhorn Recurring Job for Prometheus Snapshot Cleanup

Previous behaviour:

Longhorn's automatic snapshot generation, even without configured recurring jobs, can result in high-volume utilization for highly volatile data, such as Prometheus metrics. Prometheus frequently writes new metrics and cleans up old metrics, which is not an ideal write pattern for Longhorn's snapshot management. This can cause volume utilization to exceed configured limits, triggering volume filling up or above 100% alerts in the Cloudera Manager UI, when the actual issue is accumulated snapshots.

New behaviour:

In Cloudera Data Services on premises 1.5.5 SP1, a snapshot cleanup job is automatically configured for all Prometheus volumes. This new Recurring Job runs daily at 1:00 A.M. The timezone setting matches the Cloudera Manager Server's timezone.

You can reconfigure the schedule or completely disable this job through the Longhorn UI.

Reference:

  1. Snapshot documentation

  2. Recurring Snapshot Cleanup