hosted machine and load average

hosted machine and load average

Amos Shapira amos.shapira at gmail.com
Sun Sep 20 05:47:04 IDT 2009


+1 for collecd (lack of familiarity with munin not withstanding). It
helped me to identify a home grown c++ program going into a busy spin
first time it happened after weeks of working flowlessly.
Monit is also excellent to try to keep things cool (prevention).
For this specific scenario, I'd check whether sendmail is configured
to use one queue runner in a loop or does it use the default (on some
systems) to spawn a new queue runner every few minutes.
Also - switch to postfix.

-Amos

On 9/18/09, Ohad Levy <ohadlevy at gmail.com> wrote:
> I would use collectd instead, it has a much better resolution and scales up
> (which munin doesnt).
>
> my 2cents,
> Ohad
>
> On 9/18/09, Shachar Shemesh <shachar at shemesh.biz> wrote:
>>
>>  Hetz Ben Hamo wrote:
>>
>> So my question: What do you do in case you have the same scenario?
>> what steps do you take to prevent things like that from happening?
>>
>>    I would focus less on prevention, and more on diagnostics. I usually
>> use munin (you can see a live example at http://www.hamakor.org.il/munin).
>> It's great in that it gives you complete history of almost all relevant
>> parameters, and you can (farily easily) add your own.
>>
>> As for the specific problem you are describing, assuming it repeats
>> itself,
>> it really depends. For example, if you look at the munin history and see
>> the
>> load average slowly ascending, then I would run ps and check for runaway
>> zombies or processes. If the load average jumps suddenly, I would run cron
>> with something that logs the top ten active processes.
>>
>> Shachar
>>
>> --
>> Shachar Shemesh
>> Lingnu Open Source Consulting Ltd.
>> http://www.lingnu.com
>>
>>
>> _______________________________________________
>> Linux-il mailing list
>> Linux-il at cs.huji.ac.il
>> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>>
>>
>



More information about the Linux-il mailing list