RE: long I/O delays when strace is running

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

RE: long I/O delays when strace is running

Gluszczak, Glenn-2

I haven't run Cygwin Expect for about 6 moths on Windows but it was behaving fine last time I did.  
One thing I am aware of is you can't interrupt sleep in TCL.  The sleep must
complete until the Control C is processed (regardless of whether you redirected signals
to your own routines).  Otherwise signals seemed to be processed immediately.

Perhaps some other service is interfering.  You may want to disable other services.

Glenn

================================================================


Well I've solved one problem, but now I have another one. To try to understand why except is getting broken pipes (child processes are "going away"), I modified DejaGNU's /usr/bin/runtest so that it would strace each except process:

-exec "$expectbin" $debug -- "$runpath"/runtest.exp $target ${1+"$@"}
+exec strace --output=/tmp/runtest.$$.log --trace-children
--mask=startup  $expectbin $debug -- $runpath/runtest.exp $target ${1+"$@"}

However, when I run make -kj8 check after this, Cygwin processes that do
file I/O and even ps become very unresponsive and CPU utilization is low
(not to say that DejaGNU is good at balancing the test load). If I hit
ctrl-C on the make process, it won't exit for somewhere around 30
seconds.  I am able to reproduce this to some degree with this simple
snippet, although the delays aren't as long as doing the above:

for ((i = 0; i < 64; ++i)); do strace --output=/tmp/sleep.$$.log
--trace-children --mask=startup sleep 64; done

Can anybody try this and see if they get delays when running ps -ef or
some such?

Thanks,
Daniel

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: long I/O delays when strace is running

Daniel Santos
On 04/20/2017 08:43 AM, Gluszczak, Glenn wrote:
> I haven't run Cygwin Expect for about 6 moths on Windows but it was behaving fine last time I did.
> One thing I am aware of is you can't interrupt sleep in TCL.  The sleep must
> complete until the Control C is processed (regardless of whether you redirected signals
> to your own routines).  Otherwise signals seemed to be processed immediately.

Please note that my simplified test case isn't using expect, but the
standard /usr/bin/sleep.exe -- this is just to have strace run and trace
for a longer amount of time in order to expose the problem. (it did get
broken into a new line though, so add a backslash:

> for ((i = 0; i < 64; ++i)); do strace --output=/tmp/sleep.$$.log \
> --trace-children --mask=startup sleep 64; done

> Perhaps some other service is interfering.  You may want to disable other services.
>
> Glenn

I usually disable most services, I can probably disable a few more, but
I would like if somebody can run the above test case in one terminal
windows and in the other terminal window do a simple ps -ef and let me
know if ps responds immediately or has a delay.  I am using Windows 7,
so it could be isolated to that windows version as well.  When I do
this, ps has a 3 second delay while in
fhandler_base_overlapped::wait_overlapped, but I've seen this delay in
other processes while calling something like "open_shared."

Anyway, I'm going to try to find another simple test case that causes
more of a drastic delay.

Daniel

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: long I/O delays when strace is running

Daniel Santos
On 04/20/2017 09:38 PM, Daniel Santos wrote:
> I usually disable most services, I can probably disable a few more

Actually, I was wrong as I had re-enabled a lot of services to try some
ms debugging tools, but I've pared it down to these and the problem
still happens:

C:\Users\daniel>net start
These Windows services are started:

    Application Experience
    Application Information
    COM+ Event System
    Cygserver 64
    CYGWIN sshd 64
    DCOM Server Process Launcher
    DHCP Client
    Group Policy Client
    Network Connections
    Network List Service
    Network Location Awareness
    Network Store Interface Service
    Plug and Play
    Power
    Remote Procedure Call (RPC)
    RPC Endpoint Mapper
    Security Accounts Manager
    Shell Hardware Detection
    User Profile Service
    Windows Management Instrumentation

Also, I'm running this in a VM, so I tried disabling the virtio driver
to try to rule that out and the problem is still occurs. The following
bash script does a much better job of producing the problem:

#!/bin/bash

exec_strace_sleep ()
{
   exec strace --output=/tmp/sleep.$$-$1.log --trace-children
--mask=startup sleep 64
}

for ((i = 0; i < 8; ++i)); do
   exec_strace_sleep $i &
done
wait

This script makes any cygwin process doing file I/O practically grind to
a halt.  I can also demonstrate the problem running strace on ps alone:

strace --trace-children --output=/tmp/ps.log --mask=all ps -ef

Here is the interesting part of the log:

    84   45292 [main] ps 1296 pinfo::init: execed process windows pid
380, cygwin pid 340
    70   45362 [main] ps 1296 open_shared: name cygpid.340, n 340,
shared 0x30000 (wanted 0x0), h 0xEC, *m 6
    77   45439 [main] ps 1296 open_shared: name cygpid.1044, n 1044,
shared 0x30000 (wanted 0x0), h 0xEC, *m 6
3119382 3164821 [main] ps 1296 open_shared: name cygpid.1576, n 1576,
shared 0x30000 (wanted 0x0), h 0xEC, *m 6
   821 3165642 [main] ps 1296 open_shared: name cygpid.1420, n 1420,
shared 0x30000 (wanted 0x0), h 0xEC, *m 6
   232 3165874 [main] ps 1296 pinfo::init: execed process windows pid
1420, cygwin pid 1468
   196 3166070 [main] ps 1296 open_shared: name cygpid.1468, n 1468,
shared 0x30000 (wanted 0x0), h 0xEC, *m 6

Daniel


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

ps.log (47K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: long I/O delays when strace is running

Daniel Santos
I've tracked it down to this little Sleep() loop in pinfo::init.

       bool created = shloc != SH_JUSTOPEN;

       /* Detect situation where a transitional memory block is being
retrieved.
      If the block has been allocated with PINFO_REDIR_SIZE but not yet
      updated with a PID_EXECED state then we'll retry. */
       if (!created && !(flag & PID_NEW))
     /* If not populated, wait 2 seconds for procinfo to become populated.
        Would like to wait with finer granularity but that is not easily
        doable.  */
     for (int i = 0; i < 200 && !procinfo->ppid; i++)
       Sleep (10);

I tried putting a stupid memory barrier in the loop and a volatile read
just for kicks, but that doesn't seem to be the problem.  I'm headed off
to bed.  This only happens when using strace, so if anybody has ideas
please post.

Daniel

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: long I/O delays when strace is running

Mark Geisert
Daniel Santos wrote:

> I've tracked it down to this little Sleep() loop in pinfo::init.
>
>        bool created = shloc != SH_JUSTOPEN;
>
>        /* Detect situation where a transitional memory block is being retrieved.
>       If the block has been allocated with PINFO_REDIR_SIZE but not yet
>       updated with a PID_EXECED state then we'll retry. */
>        if (!created && !(flag & PID_NEW))
>      /* If not populated, wait 2 seconds for procinfo to become populated.
>         Would like to wait with finer granularity but that is not easily
>         doable.  */
>      for (int i = 0; i < 200 && !procinfo->ppid; i++)
>        Sleep (10);
>
> I tried putting a stupid memory barrier in the loop and a volatile read just for
> kicks, but that doesn't seem to be the problem.  I'm headed off to bed.  This
> only happens when using strace, so if anybody has ideas please post.

I can reproduce your issue on a real Win7.64 machine so that removes any
possible virtual machine root cause.  I was running 'top -s1' in one window
while running your testcase in another window.  Yes, top froze for many seconds
at a time, then caught its display up, only to freeze again repeatedly.  It was
still frozen for a while after your testcase had ended (!), then caught up.
Your mention of pinfo::init and 'ps' along with my usage of 'top' leads me to
think this may be somehow related to the /proc filesystem.

Here's my humble contribution to the discussion:

~ time w
  02:15:52 up 3 days, 20:34,  0 users,  load average: 0.99, 0.62, 0.31
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT

real    0m0.203s    <-- OK, nice and fast
user    0m0.077s
sys     0m0.139s

~ time strace -o w.out w
  02:16:23 up 3 days, 20:34,  0 users,  load average: 0.54, 0.55, 0.29
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT

real    0m28.487s   <-- but stracing it is much, much slower
user    0m0.015s
sys     0m0.000s

The 'w' command is normally pretty fast.  Running it under strace makes it take
an unreasonably long time.  Something seems busted somewhere.  The strace output
for this example has many occurrences of ~3.1-second delays that seem to occur
as w is accumulating process time information for all processes.

..mark

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: long I/O delays when strace is running

Daniel Santos
On 04/21/2017 04:38 AM, Mark Geisert wrote:

>
> I can reproduce your issue on a real Win7.64 machine so that removes
> any possible virtual machine root cause.  I was running 'top -s1' in
> one window while running your testcase in another window.  Yes, top
> froze for many seconds at a time, then caught its display up, only to
> freeze again repeatedly.  It was still frozen for a while after your
> testcase had ended (!), then caught up. Your mention of pinfo::init
> and 'ps' along with my usage of 'top' leads me to think this may be
> somehow related to the /proc filesystem.
>
> Here's my humble contribution to the discussion:
>
> ~ time w
>  02:15:52 up 3 days, 20:34,  0 users,  load average: 0.99, 0.62, 0.31
> USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
>
> real    0m0.203s    <-- OK, nice and fast
> user    0m0.077s
> sys     0m0.139s
>
> ~ time strace -o w.out w
>  02:16:23 up 3 days, 20:34,  0 users,  load average: 0.54, 0.55, 0.29
> USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
>
> real    0m28.487s   <-- but stracing it is much, much slower
> user    0m0.015s
> sys     0m0.000s
>
> The 'w' command is normally pretty fast.  Running it under strace
> makes it take an unreasonably long time.  Something seems busted
> somewhere.  The strace output for this example has many occurrences of
> ~3.1-second delays that seem to occur as w is accumulating process
> time information for all processes.
>
> ..mark

Thanks for the help Mark.  My current suspicion is that the _pinfo isn't
getting initialized when the process is exec-ed by strace, but I haven't
proved that.  I added a dump() member function to _pinfo, but it crashes
and I can't seem to debug that in gdb (it makes a stacktrace file
instead of letting me debug).  So the cygpid.<n> shared object is there,
but I'm suspecting that it's either all zeros or just not fully
initialized -- and this is for the pid of the strace process.  So for
each strace process, there's a 2-3 second delay when trying to read the
process list.

Daniel


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: long I/O delays when strace is running

Mark Geisert
Daniel Santos wrote:
> Thanks for the help Mark.  My current suspicion is that the _pinfo isn't getting
> initialized when the process is exec-ed by strace, but I haven't proved that.  I
> added a dump() member function to _pinfo, but it crashes and I can't seem to
> debug that in gdb (it makes a stacktrace file instead of letting me debug).  So
> the cygpid.<n> shared object is there, but I'm suspecting that it's either all
> zeros or just not fully initialized -- and this is for the pid of the strace
> process.  So for each strace process, there's a 2-3 second delay when trying to
> read the process list.

Re debugging strace itself, you may not realize that strace is not a
Cygwin-native program.  It's a Windows-native program.  So debugging it with
Cygwin gdb is problematic.

I can tell you roughly how strace operates.  It launches the target program with
CreateProcess(), not with exec().  It acts as the debugger of the target in
order to receive Windows DEBUG_EVENTs for things like DLL loads, thread
creation, etc. etc.  The Cygwin DLL in the target can tell when it's being run
under strace and generates trace lines with OutputDebugString() calls that
strace receives as DEBUG_EVENTs.  strace then just outputs the trace line.

strace does have to explicitly ContinueDebugEvent() on each DEBUG_EVENT it
receives, and if there's some issue with that then the target program will
remain suspended awaiting that continue.  I've never seen an issue with this but
it's another possible place to look, I suppose.

Why strace appears to interact badly with /proc accesses, I have no idea.

..mark


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: long I/O delays when strace is running

Daniel Santos

On 04/21/2017 05:12 PM, Mark Geisert wrote:

>
> Re debugging strace itself, you may not realize that strace is not a
> Cygwin-native program.  It's a Windows-native program.  So debugging
> it with Cygwin gdb is problematic.
>
> I can tell you roughly how strace operates.  It launches the target
> program with CreateProcess(), not with exec().  It acts as the
> debugger of the target in order to receive Windows DEBUG_EVENTs for
> things like DLL loads, thread creation, etc. etc.  The Cygwin DLL in
> the target can tell when it's being run under strace and generates
> trace lines with OutputDebugString() calls that strace receives as
> DEBUG_EVENTs.  strace then just outputs the trace line.
>
> strace does have to explicitly ContinueDebugEvent() on each
> DEBUG_EVENT it receives, and if there's some issue with that then the
> target program will remain suspended awaiting that continue. I've
> never seen an issue with this but it's another possible place to look,
> I suppose.
>
> Why strace appears to interact badly with /proc accesses, I have no idea.
>
> ..mark

Well thank you, I wish I had read this earlier. I've been trying to
debug (with gdb) strace (following children) and now I know why the
debugger is blowing through breakpoints, because I may be debugging
strace, but strace debugging it's own child.  I added a _pinfo::dump ()
member function that I'm calling from pinfo::init () after the test "if
(!created && !(flag & PID_NEW))" and it's crashing because for some
processes, the entire _pinfo struct isn't mapped in, just the first
page.  Of course, I'm a newbie in this sphere, so maybe I've missed
something and there's sometimes a reason for this?  (The first part of
the struct looks normal.) Otherwise, that would be another odd flaw.

Anyway, I can see that the strace process's shared _pinfo object is
never fully populated:

_pinfo 0x30000 {
   pid 2800,
   process_state 0x00000001,
   ppid 0,
   exitcode 0
   cygstarted 0,
   dwProcessId 0x00000AF0,
   progname "D:\cygwin64\bin\strace.exe",
   uid 0,
   gid 0,
   pgid 0,
   sid 0,
   ctty 0,
   has_pgid_children 0
   start_time 1492881370,
   nice 0,
   stopsig 0,
   sendsig 0x0,
   exec_sendsig 0x0,
   exec_dwProcessId 0
}

So I would venture to say that is a problem.  Also, pinfo::init() should
probably issure some error message if it waits 2-ish seconds and the
struct still isn't correctly populated.

Is there a way to debug the children of strace?  It would make it a lot
easier.  That's part of why I wrote the _pinfo::debug(), but also when I
debug strace with gdb, the _pinfo struct IS properly populated.  The
best problems are the ones that disappear when you try to debug them.

Thanks,
Daniel



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: long I/O delays when strace is running

Mark Geisert
Daniel Santos wrote:

>
> On 04/21/2017 05:12 PM, Mark Geisert wrote:
>>
>> Re debugging strace itself, you may not realize that strace is not a
>> Cygwin-native program.  It's a Windows-native program.  So debugging it with
>> Cygwin gdb is problematic.
>>
>> I can tell you roughly how strace operates.  It launches the target program
>> with CreateProcess(), not with exec().  It acts as the debugger of the target
>> in order to receive Windows DEBUG_EVENTs for things like DLL loads, thread
>> creation, etc. etc.  The Cygwin DLL in the target can tell when it's being run
>> under strace and generates trace lines with OutputDebugString() calls that
>> strace receives as DEBUG_EVENTs.  strace then just outputs the trace line.
>>
>> strace does have to explicitly ContinueDebugEvent() on each DEBUG_EVENT it
>> receives, and if there's some issue with that then the target program will
>> remain suspended awaiting that continue. I've never seen an issue with this
>> but it's another possible place to look, I suppose.
>>
>> Why strace appears to interact badly with /proc accesses, I have no idea.
>>
>> ..mark
>
> Well thank you, I wish I had read this earlier. I've been trying to debug (with
> gdb) strace (following children) and now I know why the debugger is blowing
> through breakpoints, because I may be debugging strace, but strace debugging
> it's own child.  I added a _pinfo::dump () member function that I'm calling from
> pinfo::init () after the test "if (!created && !(flag & PID_NEW))" and it's
> crashing because for some processes, the entire _pinfo struct isn't mapped in,
> just the first page.  Of course, I'm a newbie in this sphere, so maybe I've
> missed something and there's sometimes a reason for this?  (The first part of
> the struct looks normal.) Otherwise, that would be another odd flaw.

I don't know enough about pinfo.h and pinfo.cc to say anything definite on this.

> Anyway, I can see that the strace process's shared _pinfo object is never fully
> populated:
>
> _pinfo 0x30000 {
>    pid 2800,
>    process_state 0x00000001,
>    ppid 0,
>    exitcode 0
>    cygstarted 0,
>    dwProcessId 0x00000AF0,
>    progname "D:\cygwin64\bin\strace.exe",
>    uid 0,
>    gid 0,
>    pgid 0,
>    sid 0,
>    ctty 0,
>    has_pgid_children 0
>    start_time 1492881370,
>    nice 0,
>    stopsig 0,
>    sendsig 0x0,
>    exec_sendsig 0x0,
>    exec_dwProcessId 0
> }

Again, strace.exe is a Windows executable, so perhaps some of those fields don't
make sense for a non-Cygwin process and are not initialized?  Purely speculation
on my part.

> So I would venture to say that is a problem.  Also, pinfo::init() should
> probably issure some error message if it waits 2-ish seconds and the struct
> still isn't correctly populated.

That seems right.  I unfortunately don't know why the code presumes the struct
is always populated within a certain (small) amount of time.  The complaint
comment about minimum possible sleep duration sure makes it seem like it's
always supposed to be populated very quickly.

> Is there a way to debug the children of strace?  It would make it a lot easier.
> That's part of why I wrote the _pinfo::debug(), but also when I debug strace
> with gdb, the _pinfo struct IS properly populated.  The best problems are the
> ones that disappear when you try to debug them.

strace always acts as the debugger of the target process you start strace with
(or attach to; see '-p' in strace's help).  strace has a switch '-f' ==
'--trace-children' that defaults to being ON.  So by default strace is getting
DEBUG_EVENTs from the target strace launched as well as any process the target
creates.

If you explicitly set the '-f' flag, you're actually turning OFF that default
and *only* the target process sends DEBUG_EVENTs.  In that case any process the
target creates will be invisible to strace.  You could conceivably debug those
sub-processes with gdb but you likely won't be able to catch them at their
startup unless they wait for your attach.

..mark


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: long I/O delays when strace is running

Daniel Santos
Unfortunately, I don't have much time to spend on this issue as the gcc8
stage1 has started and I have a few more issues to clear up with my
patchset.

On 04/23/2017 02:42 AM, Mark Geisert wrote:

>> Anyway, I can see that the strace process's shared _pinfo object is
>> never fully
>> populated:
>>
>> _pinfo 0x30000 {
>>    pid 2800,
>>    process_state 0x00000001,
>>    ppid 0,
>>    exitcode 0
>>    cygstarted 0,
>>    dwProcessId 0x00000AF0,
>>    progname "D:\cygwin64\bin\strace.exe",
>>    uid 0,
>>    gid 0,
>>    pgid 0,
>>    sid 0,
>>    ctty 0,
>>    has_pgid_children 0
>>    start_time 1492881370,
>>    nice 0,
>>    stopsig 0,
>>    sendsig 0x0,
>>    exec_sendsig 0x0,
>>    exec_dwProcessId 0
>> }
>
> Again, strace.exe is a Windows executable, so perhaps some of those
> fields don't make sense for a non-Cygwin process and are not
> initialized?  Purely speculation on my part.

Oh, I understand now, thanks. :)  So it doesn't link to cygwin1.dll (or
any other cygwin libs), that makes sense.  So the flaw is probably
thinking that this executable *should* have uid, guid, ppid, etc.  Yet,
it exists in the cygwin process database (apparently a bunch of shared
(probably anonymous) files?).  So the mistake is either listing it in
the database or not accounting for the possibility of strace, the
semi-cygwin program?  Maybe there should be (or is?) a flag to tell
readers of the cygwin process database that this is a "special case"
process?

>> So I would venture to say that is a problem.  Also, pinfo::init() should
>> probably issure some error message if it waits 2-ish seconds and the
>> struct
>> still isn't correctly populated.
>
> That seems right.  I unfortunately don't know why the code presumes
> the struct is always populated within a certain (small) amount of
> time.  The complaint comment about minimum possible sleep duration
> sure makes it seem like it's always supposed to be populated very
> quickly.
>

Yes, and not knowing cygwin's architecture it's hard for me to guess
why, although I can do a git blame and try to understand when the code
was put in.  Also, anything like this usually screams race condition in
my ear, but I can't say that w/o really understanding it well and what
assumptions are being made.  For instance, if another thread/process
could really modify this then reads should be done using known atomic
instructions.  On 32-bit x86, iirc, a mov of the machine word size is
always atomic, i.e., either you get an intact old value or you get an
intact new value, you should never get two bytes of the new value and
two bytes of the old.  But when I'm writing C code, I never want to
presume what the compiler will emit for situations like this and it's
better to use some atomic read/write macro/inline, even though I can't
really imagine this particular snippet not using a simple mov, using an
explicit "atomic" function/macro conveys the intention.

I should note that in the case of trying to analyze this problem with
expect, I allowed a make -kj8 check to run for a few days (should take
maybe 4 hours) and I never had the race condition. Presumably, if I
allowed it to run for a very long period or time (months or years) it
would have likely occurred.

>> Is there a way to debug the children of strace?  It would make it a
>> lot easier.
>> That's part of why I wrote the _pinfo::debug(), but also when I debug
>> strace
>> with gdb, the _pinfo struct IS properly populated.  The best problems
>> are the
>> ones that disappear when you try to debug them.
>
> strace always acts as the debugger of the target process you start
> strace with (or attach to; see '-p' in strace's help).  strace has a
> switch '-f' == '--trace-children' that defaults to being ON.  So by
> default strace is getting DEBUG_EVENTs from the target strace launched
> as well as any process the target creates.
>
> If you explicitly set the '-f' flag, you're actually turning OFF that
> default and *only* the target process sends DEBUG_EVENTs.  In that
> case any process the target creates will be invisible to strace.  You
> could conceivably debug those sub-processes with gdb but you likely
> won't be able to catch them at their startup unless they wait for your
> attach.
>

Very interesting!  Is it possible to have two processes debugging and
have strace forward debug events that it isn't interested in to another
debugger in the chain?  I'm probably just talking crazy here.  Either
way, that's ancillary to fixing the problem.

Daniel

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: long I/O delays when strace is running

Daniel Santos
In reply to this post by Mark Geisert
Well, waiting for GNU/Linux tests to run, so I had a little more time to
play with this.

On 04/23/2017 02:42 AM, Mark Geisert wrote:

> Daniel Santos wrote:
>> Well thank you, I wish I had read this earlier. I've been trying to
>> debug (with
>> gdb) strace (following children) and now I know why the debugger is
>> blowing
>> through breakpoints, because I may be debugging strace, but strace
>> debugging
>> it's own child.  I added a _pinfo::dump () member function that I'm
>> calling from
>> pinfo::init () after the test "if (!created && !(flag & PID_NEW))"
>> and it's
>> crashing because for some processes, the entire _pinfo struct isn't
>> mapped in,
>> just the first page.  Of course, I'm a newbie in this sphere, so
>> maybe I've
>> missed something and there's sometimes a reason for this?  (The first
>> part of
>> the struct looks normal.) Otherwise, that would be another odd flaw.
>
> I don't know enough about pinfo.h and pinfo.cc to say anything
> definite on this.

Well here's one mystery solved.  I modified my dump() function to first
dump the first page of the struct and then the second, so that when it
crashes I can see the contents of the first page at least:

_pinfo 0x30000 {
   pid 2044,
   process_state 0x00004001,
   ppid 2044,
   exitcode 0
   cygstarted 0,
   dwProcessId 0x00000000,
   progname "",

The process_state's bit 0x00004000 happens to be PID_EXECED, so this is
intentionally shorter because there's no more information.

As for the strange _pinfo object that causes the delay, this appears to
be an error in the process database because the windows PID is a
duplicate of another process entry. (I modified ps.exe to output the
_pinfo::process_state field.)

$ strace --trace-children --output=/tmp/ps.log --mask=all ps -el
       PID    PPID    PGID     WINPID   TTY         UID STIME  STATE  
COMMAND
       948    1788     948       1772  pty0      197608 00:04:22
000000c1 /usr/bin/bash
      2720       1    2720       2720  ?             18 00:02:42
00000041 /usr/bin/cygrunsrv
       860       0       0        860  pty0           0 00:04:30
00000001 /usr/bin/cygrunsrv
       984       1    2168        984  pty0      197608 00:04:30
00000041 /usr/bin/ps
      2376    1628    2376       2612  ?         197610 00:02:45
000000c1 /usr/sbin/sshd
      2524    2720    2524       1404  ?             18 00:02:42
000000c1 /usr/sbin/cygserver
      2168     948    2168        860  pty0      197608 00:04:30
00000061 /usr/bin/strace
      1788    2376    1788       2788  ?         197610 00:04:22
000000c1 /usr/sbin/sshd
      1628       1    1628       1628  ?         197610 00:02:45
00000041 /usr/bin/cygrunsrv

The entry that causes the delay is the one with a zero PPID and PGID and
the progname field is wrong --  It isn't "cygrunsrv".  It always has the
same progname name of the previous process.  Notice that the WINPID
field has the same value as the PID of the strace process that's real
(2168).

So my guess is that this object shouldn't even be there.

Daniel

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: long I/O delays when strace is running

Mark Geisert
Daniel Santos wrote:

> Well, waiting for GNU/Linux tests to run, so I had a little more time to play
> with this.
>
> On 04/23/2017 02:42 AM, Mark Geisert wrote:
>> Daniel Santos wrote:
>>> Well thank you, I wish I had read this earlier. I've been trying to debug (with
>>> gdb) strace (following children) and now I know why the debugger is blowing
>>> through breakpoints, because I may be debugging strace, but strace debugging
>>> it's own child.  I added a _pinfo::dump () member function that I'm calling from
>>> pinfo::init () after the test "if (!created && !(flag & PID_NEW))" and it's
>>> crashing because for some processes, the entire _pinfo struct isn't mapped in,
>>> just the first page.  Of course, I'm a newbie in this sphere, so maybe I've
>>> missed something and there's sometimes a reason for this?  (The first part of
>>> the struct looks normal.) Otherwise, that would be another odd flaw.
>>
>> I don't know enough about pinfo.h and pinfo.cc to say anything definite on this.
>
> Well here's one mystery solved.  I modified my dump() function to first dump the
> first page of the struct and then the second, so that when it crashes I can see
> the contents of the first page at least:
>
> _pinfo 0x30000 {
>    pid 2044,
>    process_state 0x00004001,
>    ppid 2044,
>    exitcode 0
>    cygstarted 0,
>    dwProcessId 0x00000000,
>    progname "",
>
> The process_state's bit 0x00004000 happens to be PID_EXECED, so this is
> intentionally shorter because there's no more information.
>
> As for the strange _pinfo object that causes the delay, this appears to be an
> error in the process database because the windows PID is a duplicate of another
> process entry. (I modified ps.exe to output the _pinfo::process_state field.)
>
> $ strace --trace-children --output=/tmp/ps.log --mask=all ps -el
>        PID    PPID    PGID     WINPID   TTY         UID STIME  STATE COMMAND
>        948    1788     948       1772  pty0      197608 00:04:22 000000c1
> /usr/bin/bash
>       2720       1    2720       2720  ?             18 00:02:42 00000041
> /usr/bin/cygrunsrv
>        860       0       0        860  pty0           0 00:04:30 00000001
> /usr/bin/cygrunsrv
>        984       1    2168        984  pty0      197608 00:04:30 00000041
> /usr/bin/ps
>       2376    1628    2376       2612  ?         197610 00:02:45 000000c1
> /usr/sbin/sshd
>       2524    2720    2524       1404  ?             18 00:02:42 000000c1
> /usr/sbin/cygserver
>       2168     948    2168        860  pty0      197608 00:04:30 00000061
> /usr/bin/strace
>       1788    2376    1788       2788  ?         197610 00:04:22 000000c1
> /usr/sbin/sshd
>       1628       1    1628       1628  ?         197610 00:02:45 00000041
> /usr/bin/cygrunsrv
>
> The entry that causes the delay is the one with a zero PPID and PGID and the
> progname field is wrong --  It isn't "cygrunsrv".  It always has the same
> progname name of the previous process.  Notice that the WINPID field has the
> same value as the PID of the strace process that's real (2168).
>
> So my guess is that this object shouldn't even be there.

Excellent debugging work!  I'm inclined to agree with your last point.  I'm
poring over pinfo.cc as well as dcrt0.cc, which is the Cygwin DLL init code.
The latter talks about special cases if the DLL is runtime loaded (like strace
does) vs link-time loaded (like Cygwin apps do).  May take me a while to respond.

..mark


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: long I/O delays when strace is running

Daniel Santos
On 04/24/2017 02:00 AM, Mark Geisert wrote:
>
> Excellent debugging work!  I'm inclined to agree with your last
> point.  I'm poring over pinfo.cc as well as dcrt0.cc, which is the
> Cygwin DLL init code. The latter talks about special cases if the DLL
> is runtime loaded (like strace does) vs link-time loaded (like Cygwin
> apps do).  May take me a while to respond.

Hah! me too. :)  At least I'm learning a lot.  I have a better
understanding of how it stores these in the windows object database
thing under \\BaseNamedObjects and I can browse it to some degree with
WinObj from sysinternals, but I'm still trying to figure out where that
is initialized.  My tests will be done soon and I'll be back to work on gcc.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: long I/O delays when strace is running

Daniel Santos
In reply to this post by Mark Geisert
I finally found a solution and submitted a patch, but I don't know if
it's the correct fix or not.

Daniel

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple