ptrace in production systems
Shachar Shemesh
shachar at shemesh.biz
Sun Feb 1 15:42:50 IST 2009
Hi all,
I've been bad. I know I have. This goes against any instinct that I
have, but I am failing to see a good reason WHY.
The setup - I have an embedded system that is composed of several
daemons. The situation is that one of the daemons has to restart another
daemon. The restarted daemon (an SNMP agent) is outward facing, and
therefor the time it takes it to restart should be minimized, if possible.
Then again, it is a daemon. It has no parent (well, init is its parent).
I can find it easily enough using its pid file, but I cannot get a
notification when it has actually quit. There is an option to add to the
SNMP the commands to send a notification to the other daemon when it
exists, but this has several disadvantages I will not go into right now.
I can also poll for the exit (i.e. - kill( pid, 0) every second until it
says there is no such process), but that adds latency until I begin the
restart process.
So what I did was to use ptrace. The controlling daemon connects to the
SNMP agent as a debugger, and this way gets notified with the usual
"wait" interface when the agent exits. I am not doing any fancy register
manipulation or any such stuff.
Using ptrace as part of a production system feels wrong, but I cannot
say exactly why. The small amount of experimentation I did with this
system, as well as my extensive experience with ptrace when working on
fakeroot-ng tell me that the interface is stable enough. Still, it feels
wrong.
Any feedback would be welcome.
Shachar
More information about the Linux-il
mailing list