Limitation for Spark History Server with high availability

You must be aware of a limitation related to how Spark History Server (SHS) with high availability operates in Cloudera Manager.

  • The second SHS in the cluster does not clean the Spark event logs. Cleaning Spark event logs is automatically disabled from the Custom Service Descriptor. The second server can only read logs. This limitation ensures that two SHSs do not try to delete the same files. If the first SHS is down, the second one does not take over the cleaner task. This is not a critical issue because if the first SHS starts again, it will delete those old Spark event logs. The default event log cleaner interval (spark.history.fs.cleaner.interval) is 1 day in Cloudera Manager which means that the first SHS only deletes the old logs once per day by default.