How often should a monitoring system send alerts?

This morning, I missed a rather important alert from a monitoring system. Why? I ignored it!

It seems to be common practice to send alerts out hourly for the slightest problem with a service (i.e. disk space > 80% utilisation results in an email spamming your inbox every hour).

Over the weekend, a few systems had fallen into that (oh my god it’s the end of the world and I only have 20% free disk space) state and my inbox was full of warnings so I quickly filed all the alerts away as these systems aren’t really anything to do with me. See the problem?

I had assumed that all alerts were these two systems crying. Hidden amongst these non-critical messages were rather important messages saying that a critical service was having problems.

Surely there is a better way to alert than “spam your inbox until you deal with it”?

Why not just send one alert when the service is down (and maybe one when it is back up again)? You could even put these alerts into a ticketing system so that they can be tracked!

I know that I’m not the only person who this has happened to – can anyone else suggest a better way of dealing with system alerts without spamming your inbox and desensitizing you to the alerts?

(It’s clearly a completely separate issue about only sending alerts to the person(s) responsible for a specific system).

One Response to “How often should a monitoring system send alerts?”

  1. Hamlesh Says:

    Er, use Nagios, setup event based alerting, oh and turn off alerts for things you’re going to ignore anyway!

Leave a Reply