ptrace problem - confounded, dazed and confused at the inconsistencies

ptrace problem - confounded, dazed and confused at the inconsistencies

Shachar Shemesh shachar at shemesh.biz
Wed Oct 27 21:23:26 IST 2010


On 27/10/10 20:53, Valery Reznic wrote:
> OK, you was warned :)
>    
Yes, I was. Still....
>> How can two programs do the same thing on the same system,
>> and yet get such different results?
>>      
> Let's take 'read' syscall.
> read(10, ....)
>
>    
I am not talking about a single syscall that behaves differently. I'm 
talking about tracing the entire history of the process from the time it 
performed "execve".

If you want, I uploaded to 
http://fakeroot-ng.lingnu.com/files/clone-traces.tgz the two logs. The 
two are from strace, once tracing strace tracing vi. The second time it 
is strace attached to the fakeroot-ng daemon after that is monitoring a 
shell, and then the shell is used to run "vi". This second log continues 
until I kill the daemon to release it from the deadlock.

Both the inner strace and fakeroot-ng were instructed to issue logs of 
what they find, and you can find this log in "write" calls throughout 
the logs. The logs are, of course, not identical, but I failed to find 
any difference that should matter.

Since not all of the logs are interesting, the interesting parts start 
when the processes write that they detected an execve of /usr/bin/vi, 
and ends a couple of lines after the first time the word "clone" appears 
after that point. In the trace-strace log, you can see that after 
releasing the process to perform the clone (PTRACE_SYSCALL), it performs 
wait4 twice, and gets two notifications, one for the parent thread 
(3299) and one for the child one (3300).

In the fakeroot log, you can see the clone(RETURN) log message, 
identifying the child thread 3885 being created, but all of the waits 
performed only report the parent thread, 3884. This goes on until wait 
returns with "nothing more to report", and pselect hangs in futile wait 
for the signal to arrive.

Not in this trace, but had I sent a non-lethal signal, you would see the 
wait repeated, again saying there is nothing to report, and a hang 
again. In essence, the difference in the waits should not have happened, 
as far as I can tell, as the system calls were treated the same.
> I suspect there is something like this in your case.
>    
You have everything you need in order to prove you suspicion. In fact, 
strace is easilly installable from your nearest repository, as well as 
vi and bash. Many will also carry fakeroot-ng, but if not, feel free to 
pull the latest SVN image and compile it yourself.
> May be there is something that strace do and fakeroot-ng don't?
>    
I'm sure there is. I just can't figure out what it is. The strace code 
does not appear to have any special handling as opposed to, say, using 
clone to create a new process (which is a case which works flawlessly in 
fakeroot-ng).
> Setting some flag(s) to clone? Calling some system call that affect wait behaviour?
>    
Same flags to clone in both cases (vi sets the same flags, and both 
strace and fakeroot-ng change them to the same different flags).
I'm not aware of any settings that globally affects wait's behavior.

Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com




More information about the Linux-il mailing list