ptrace in production systems

ptrace in production systems

Noam Meltzer tsnoam at gmail.com
Sun Feb 1 17:02:43 IST 2009


Shachar,

What you need is the functionality of a watchdog.
HA clusters provide this functionality. Anyhow, I guess you don't have/need
an HA cluster, so what you might want to look at is monit:
http://mmonit.com/monit/
I understand that you want to monitor it using your own daemon, but I don't
believe that you can reinvent the wheel in a reasonable time and effort, so
I point you to Monit.

- Noam

On Sun, Feb 1, 2009 at 3:42 PM, Shachar Shemesh <shachar at shemesh.biz> wrote:

> Hi all,
>
> I've been bad. I know I have. This goes against any instinct that I have,
> but I am failing to see a good reason WHY.
>
> The setup - I have an embedded system that is composed of several daemons.
> The situation is that one of the daemons has to restart another daemon. The
> restarted daemon (an SNMP agent) is outward facing, and therefor the time it
> takes it to restart should be minimized, if possible.
>
> Then again, it is a daemon. It has no parent (well, init is its parent). I
> can find it easily enough using its pid file, but I cannot get a
> notification when it has actually quit. There is an option to add to the
> SNMP the commands to send a notification to the other daemon when it exists,
> but this has several disadvantages I will not go into right now. I can also
> poll for the exit (i.e. - kill( pid, 0) every second until it says there is
> no such process), but that adds latency until I begin the restart process.
>
> So what I did was to use ptrace. The controlling daemon connects to the
> SNMP agent as a debugger, and this way gets notified with the usual "wait"
> interface when the agent exits. I am not doing any fancy register
> manipulation or any such stuff.
>
> Using ptrace as part of a production system feels wrong, but I cannot say
> exactly why. The small amount of experimentation I did with this system, as
> well as my extensive experience with ptrace when working on fakeroot-ng tell
> me that the interface is stable enough. Still, it feels wrong.
>
> Any feedback would be welcome.
>
> Shachar
>
> _______________________________________________
> Linux-il mailing list
> Linux-il at cs.huji.ac.il
> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.huji.ac.il/pipermail/linux-il/attachments/20090201/aabfe3f4/attachment.html>


More information about the Linux-il mailing list