Looking for a performance/health monitoring and alerting solution

Looking for a performance/health monitoring and alerting solution

Ori Berger linux-il at orib.net
Mon Jun 16 02:12:47 IDT 2014


I'm looking for a single system that can track all of a remote server's 
health and performance status, and which stores a detailed 
every-few-seconds history. So far, I haven't found one comprehensive 
system that does it all; also, triggering alarms in "bad" situations 
(such as no disk space, etc). Things I'm interested in (in parentheses - 
how I track them at the moment. Note shinken is a nagios-compatible thing).

Free disk space (shinken)
Server load (shinken)
Debian package and security updates  (shinken)
NTP drift (shinken)
Service ping/reply time (shinken)
Upload/download rates per interface (mrtg)
Temperatures (sensord, hddtemp)
Security logs, warning and alerts e.g. fail2ban, auth.log (rsync of log 
files)

I have a few tens of servers to monitor, which I would like to do with 
one software and one console. Those servers are not all physically on 
the same network, nor do they have a VPN (so, no UDP) but tcp and ssh 
are mostly reliable even though they are low bandwidth.

Please note that shinken (much like nagios) doesn't really give a good 
visible history of things it measures - only alerts; Also, it can't 
really sample things every few seconds - the lowest reasonable update 
interval (given shinken's architecture) is ~5 minutes for the things it 
measures above.

Any recommendations?

Thanks in advance,
Ori



More information about the Linux-il mailing list