Looking for a performance/health monitoring and alerting solution

Looking for a performance/health monitoring and alerting solution

Evgeniy Ginzburg nad.oby at gmail.com
Mon Jun 16 11:06:56 IDT 2014


I can second Zabbix.
We use it in our current setup 100+ servers, works OK.
Also you can take Nagios. or one of the clones
One of previous my previous monitoring solutions had 10000+ specialized
requests/hour with help of custom scripts in perl & C.

One thing to consider.
Most of monitoring solutioms use Round-Robin Database (RRD) as backend
storage for time-series data.
If you'll need fine granularity for "old" (beginning with minuts/hours)
data avoid those setups.
https://en.wikipedia.org/wiki/Round-Robin_Database


Regards, Evgeniy.


On Mon, Jun 16, 2014 at 10:47 AM, Rabin Yasharzadehe <rabin at rabin.io> wrote:

> I can recommend Zabbix, I was never used it on a large network (~30 server
> most), but i was happy with it.
>
> - you can set the monitoring interval for each item (from 1s -> days)
> - samples are stored in the DB, and graphs are plotted only when you need
> them
> - have a build in support for SMS and Jabber message alerts.
> - works with agent, but also works with SNMP and scripts you can writes.
>
> note that you'll need to provide enough storage for it.
> (i think they have the formula or a calculator in there website, which you
> can use to calculate the storage you'll need )
>
>
> *--Rabin*
>
>
> On Mon, Jun 16, 2014 at 2:12 AM, Ori Berger <linux-il at orib.net> wrote:
>
>> I'm looking for a single system that can track all of a remote server's
>> health and performance status, and which stores a detailed
>> every-few-seconds history. So far, I haven't found one comprehensive system
>> that does it all; also, triggering alarms in "bad" situations (such as no
>> disk space, etc). Things I'm interested in (in parentheses - how I track
>> them at the moment. Note shinken is a nagios-compatible thing).
>>
>> Free disk space (shinken)
>> Server load (shinken)
>> Debian package and security updates  (shinken)
>> NTP drift (shinken)
>> Service ping/reply time (shinken)
>> Upload/download rates per interface (mrtg)
>> Temperatures (sensord, hddtemp)
>> Security logs, warning and alerts e.g. fail2ban, auth.log (rsync of log
>> files)
>>
>> I have a few tens of servers to monitor, which I would like to do with
>> one software and one console. Those servers are not all physically on the
>> same network, nor do they have a VPN (so, no UDP) but tcp and ssh are
>> mostly reliable even though they are low bandwidth.
>>
>> Please note that shinken (much like nagios) doesn't really give a good
>> visible history of things it measures - only alerts; Also, it can't really
>> sample things every few seconds - the lowest reasonable update interval
>> (given shinken's architecture) is ~5 minutes for the things it measures
>> above.
>>
>> Any recommendations?
>>
>> Thanks in advance,
>> Ori
>>
>> _______________________________________________
>> Linux-il mailing list
>> Linux-il at cs.huji.ac.il
>> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>>
>
>
> _______________________________________________
> Linux-il mailing list
> Linux-il at cs.huji.ac.il
> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>
>


-- 
So long, and thanks for all the fish.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20140616/b20e677f/attachment.html>


More information about the Linux-il mailing list