Limitation for Spark History Server with high availability

You must be aware of a limitation related to how Spark History Server with high availability operates in Cloudera Manager.

  • The second Spark History Server in the cluster does not clean the Spark event logs. Cleaning Spark event logs is automatically disabled from the Custom Service Descriptor. The second server can only read logs. This limitation ensures that two Spark History Servers do not try to delete the same files. If the first Spark History Server is down, the second one does not take over the cleaner task. This is not a critical issue because if the first Spark History Server starts again, it will delete those old Spark event logs. The default event log cleaner interval (spark.history.fs.cleaner.interval) is 1 day in Cloudera Manager which means that the first Spark History Server only deletes the old logs once per day by default.