ptrace in production systems

ptrace in production systems

Shachar Shemesh shachar at shemesh.biz
Sun Feb 1 15:42:50 IST 2009


Hi all,

I've been bad. I know I have. This goes against any instinct that I 
have, but I am failing to see a good reason WHY.

The setup - I have an embedded system that is composed of several 
daemons. The situation is that one of the daemons has to restart another 
daemon. The restarted daemon (an SNMP agent) is outward facing, and 
therefor the time it takes it to restart should be minimized, if possible.

Then again, it is a daemon. It has no parent (well, init is its parent). 
I can find it easily enough using its pid file, but I cannot get a 
notification when it has actually quit. There is an option to add to the 
SNMP the commands to send a notification to the other daemon when it 
exists, but this has several disadvantages I will not go into right now. 
I can also poll for the exit (i.e. - kill( pid, 0) every second until it 
says there is no such process), but that adds latency until I begin the 
restart process.

So what I did was to use ptrace. The controlling daemon connects to the 
SNMP agent as a debugger, and this way gets notified with the usual 
"wait" interface when the agent exits. I am not doing any fancy register 
manipulation or any such stuff.

Using ptrace as part of a production system feels wrong, but I cannot 
say exactly why. The small amount of experimentation I did with this 
system, as well as my extensive experience with ptrace when working on 
fakeroot-ng tell me that the interface is stable enough. Still, it feels 
wrong.

Any feedback would be welcome.

Shachar



More information about the Linux-il mailing list