Looking for a performance/health monitoring and alerting solution
Ori Berger
linux-il at orib.net
Mon Jun 16 02:12:47 IDT 2014
I'm looking for a single system that can track all of a remote server's
health and performance status, and which stores a detailed
every-few-seconds history. So far, I haven't found one comprehensive
system that does it all; also, triggering alarms in "bad" situations
(such as no disk space, etc). Things I'm interested in (in parentheses -
how I track them at the moment. Note shinken is a nagios-compatible thing).
Free disk space (shinken)
Server load (shinken)
Debian package and security updates (shinken)
NTP drift (shinken)
Service ping/reply time (shinken)
Upload/download rates per interface (mrtg)
Temperatures (sensord, hddtemp)
Security logs, warning and alerts e.g. fail2ban, auth.log (rsync of log
files)
I have a few tens of servers to monitor, which I would like to do with
one software and one console. Those servers are not all physically on
the same network, nor do they have a VPN (so, no UDP) but tcp and ssh
are mostly reliable even though they are low bandwidth.
Please note that shinken (much like nagios) doesn't really give a good
visible history of things it measures - only alerts; Also, it can't
really sample things every few seconds - the lowest reasonable update
interval (given shinken's architecture) is ~5 minutes for the things it
measures above.
Any recommendations?
Thanks in advance,
Ori
More information about the Linux-il
mailing list