Looking for a performance/health monitoring and alerting solution

Looking for a performance/health monitoring and alerting solution

Rabin Yasharzadehe rabin at rabin.io
Mon Jun 16 10:47:50 IDT 2014


I can recommend Zabbix, I was never used it on a large network (~30 server
most), but i was happy with it.

- you can set the monitoring interval for each item (from 1s -> days)
- samples are stored in the DB, and graphs are plotted only when you need
them
- have a build in support for SMS and Jabber message alerts.
- works with agent, but also works with SNMP and scripts you can writes.

note that you'll need to provide enough storage for it.
(i think they have the formula or a calculator in there website, which you
can use to calculate the storage you'll need )


*--Rabin*


On Mon, Jun 16, 2014 at 2:12 AM, Ori Berger <linux-il at orib.net> wrote:

> I'm looking for a single system that can track all of a remote server's
> health and performance status, and which stores a detailed
> every-few-seconds history. So far, I haven't found one comprehensive system
> that does it all; also, triggering alarms in "bad" situations (such as no
> disk space, etc). Things I'm interested in (in parentheses - how I track
> them at the moment. Note shinken is a nagios-compatible thing).
>
> Free disk space (shinken)
> Server load (shinken)
> Debian package and security updates  (shinken)
> NTP drift (shinken)
> Service ping/reply time (shinken)
> Upload/download rates per interface (mrtg)
> Temperatures (sensord, hddtemp)
> Security logs, warning and alerts e.g. fail2ban, auth.log (rsync of log
> files)
>
> I have a few tens of servers to monitor, which I would like to do with one
> software and one console. Those servers are not all physically on the same
> network, nor do they have a VPN (so, no UDP) but tcp and ssh are mostly
> reliable even though they are low bandwidth.
>
> Please note that shinken (much like nagios) doesn't really give a good
> visible history of things it measures - only alerts; Also, it can't really
> sample things every few seconds - the lowest reasonable update interval
> (given shinken's architecture) is ~5 minutes for the things it measures
> above.
>
> Any recommendations?
>
> Thanks in advance,
> Ori
>
> _______________________________________________
> Linux-il mailing list
> Linux-il at cs.huji.ac.il
> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20140616/5434d8fe/attachment-0001.html>


More information about the Linux-il mailing list