Re: Hang with 1.5.18, 1.5.19 snapshot 20051029

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Hang with 1.5.18, 1.5.19 snapshot 20051029

Volker Quetschke
Hi!

Peter Rehley wrote:

> I have a problem where a configure script is hanging.  I first saw  the
> behavior in 1.5.18, and it's still there in the latest snapshot.  The
> only machines that we are seeing it hang on are windows 2000  machines,
> sp4, with duel pentinum 933 mhz processors, and using ssh  to login to
> the machine.  I haven't been able to reproduce the  problem on single
> processor machines or when ssh is not used.
>
> Under 1.5.18, the hang occurred about 1 in ten times in the  
> test_configure script (provided in the bash_test.tar.bz2 file.  Under  
> the latest snapshot it's about 1 in 900.
>
> When the hang happens it appears that a process is completed, but  still
> can be found in the process directory.  The cmdline file says  
> <defunct>, but the process still shows up in the process list (ps -
> ef).  If I try to clean up by killing the process, the kill command  
> says that the process doesn't exist.  The only way that I can make  the
> hung process go away is by using the task manager to kill the  process.
Your symptoms look familiar to our OOo build hang. I'm curious if in
your case a:

$ ls /proc/<hangpid>/fd

also cures the hang. (Sometimes this has to be issued several times.)

Volker

>
> The simplest test I've gotten down to is:
>
> ### Simple Test
> #! /bin/sh
> # Guess values for system-dependent variables and create Makefiles.
> # Generated by GNU Autoconf 2.59 for expr-configure 1.5.11-1.
> #
> # Report bugs to <cygwin at cygwin dot com>.
> #
> # Copyright (C) 2003 Free Software Foundation, Inc.
> # This configure script is free software; the Free Software Foundation
> # gives unlimited permission to copy, distribute and modify it.
> ## --------------------- ##
> ## M4sh Initialization.  ##
> ## --------------------- ##
>
> set -xv
>
> count=0
> while [ ! -f stop ] ; do
>   as_var=LC_MONETARY
>   if (test -z "`(eval $as_var=C; export $as_var; echo ho) 2>&1`"); then
>     echo "hi"
>     eval $as_var=C; export $as_var
>   fi
>   count=`expr $count + 1`
>   echo $count
> done
> ### End simple test
>
> Someplace in the eval line the hang occurs.  Unfortunately I haven't  
> had success when using strace.
>
> If I've missed anything or there are questions about the above,  please
> let me know.
> Peter
>
>
>
>
>
>
>  
>
>
> ------------------------------------------------------------------------
>
> --
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
> Problem reports:       http://cygwin.com/problems.html
> Documentation:         http://cygwin.com/docs.html
> FAQ:                   http://cygwin.com/faq/

--
If you like my work consider:  http://www.scytek.de/donations.html
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D

signature.asc (262 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Hang with 1.5.18, 1.5.19 snapshot 20051029

Christopher Faylor-2
On Tue, Nov 01, 2005 at 08:52:33AM -0500, Volker Quetschke wrote:

>Hi!
>
>Peter Rehley wrote:
>>I have a problem where a configure script is hanging.  I first saw  the
>>behavior in 1.5.18, and it's still there in the latest snapshot.  The
>>only machines that we are seeing it hang on are windows 2000  machines,
>>sp4, with duel pentinum 933 mhz processors, and using ssh  to login to
>>the machine.  I haven't been able to reproduce the  problem on single
>>processor machines or when ssh is not used.
>>
>>Under 1.5.18, the hang occurred about 1 in ten times in the  
>>test_configure script (provided in the bash_test.tar.bz2 file.  Under  
>>the latest snapshot it's about 1 in 900.
>>
>>When the hang happens it appears that a process is completed, but  still
>>can be found in the process directory.  The cmdline file says  
>><defunct>, but the process still shows up in the process list (ps -
>>ef).  If I try to clean up by killing the process, the kill command  
>>says that the process doesn't exist.  The only way that I can make  the
>>hung process go away is by using the task manager to kill the  process.
>
>Your symptoms look familiar to our OOo build hang. I'm curious if in
>your case a:
>
>$ ls /proc/<hangpid>/fd
>
>also cures the hang. (Sometimes this has to be issued several times.)

This would make it "not a regression", if so.

In fact, I did see some reports that dmake hangs in 1.5.18 in the archives.

I'm wondering if it is YA symptom of:

  http://cygwin.com/ml/cygwin/2005-09/msg00923.html

since, try as I might, I can't see any way that a process would hang and
then become unstuck by performing a "ls /proc/<hangpid>/fd".

cgf

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Reply | Threaded
Open this post in threaded view
|

Re: Hang with 1.5.18, 1.5.19 snapshot 20051029

Volker Quetschke
Christopher Faylor wrote:

> On Tue, Nov 01, 2005 at 08:52:33AM -0500, Volker Quetschke wrote:
>>Peter Rehley wrote:
>>
>>>I have a problem where a configure script is hanging.  I first saw  the
>>>behavior in 1.5.18, and it's still there in the latest snapshot.  The
>>>only machines that we are seeing it hang on are windows 2000  machines,
>>>sp4, with duel pentinum 933 mhz processors, and using ssh  to login to
>>>the machine.  I haven't been able to reproduce the  problem on single
>>>processor machines or when ssh is not used.
>>>
>>>Under 1.5.18, the hang occurred about 1 in ten times in the  
>>>test_configure script (provided in the bash_test.tar.bz2 file.  Under  
>>>the latest snapshot it's about 1 in 900.
>>>
>>>When the hang happens it appears that a process is completed, but  still
>>>can be found in the process directory.  The cmdline file says  
>>><defunct>, but the process still shows up in the process list (ps -
>>>ef).  If I try to clean up by killing the process, the kill command  
>>>says that the process doesn't exist.  The only way that I can make  the
>>>hung process go away is by using the task manager to kill the  process.
>>
>>Your symptoms look familiar to our OOo build hang. I'm curious if in
>>your case a:
>>
>>$ ls /proc/<hangpid>/fd
>>
>>also cures the hang. (Sometimes this has to be issued several times.)

> This would make it "not a regression", if so.
>
> In fact, I did see some reports that dmake hangs in 1.5.18 in the archives.

Yes, the current snapshots are a lot better than the 1.5.18 release. You already
fixed all of the problems I could reproduce. Thanks again, this is a lot better
than the current release.

But like the last remaining OOo build hang I cannot reproduce this hang either.

I tried the script in rxvt and whatever you call it if you use the cygwin icon.
It doesn't hang for me.

> I'm wondering if it is YA symptom of:
>
>   http://cygwin.com/ml/cygwin/2005-09/msg00923.html
>
> since, try as I might, I can't see any way that a process would hang and
> then become unstuck by performing a "ls /proc/<hangpid>/fd".

*shrug*

Regards

     Volker


--
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D

signature.asc (262 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Hang with 1.5.18, 1.5.19 snapshot 20051029

Christopher Faylor-2
On Tue, Nov 01, 2005 at 09:19:24AM -0500, Volker Quetschke wrote:

>Christopher Faylor wrote:
>>On Tue, Nov 01, 2005 at 08:52:33AM -0500, Volker Quetschke wrote:
>>>Peter Rehley wrote:
>>>
>>>>I have a problem where a configure script is hanging.  I first saw  the
>>>>behavior in 1.5.18, and it's still there in the latest snapshot.  The
>>>>only machines that we are seeing it hang on are windows 2000  machines,
>>>>sp4, with duel pentinum 933 mhz processors, and using ssh  to login to
>>>>the machine.  I haven't been able to reproduce the  problem on single
>>>>processor machines or when ssh is not used.
>>>>
>>>>Under 1.5.18, the hang occurred about 1 in ten times in the  
>>>>test_configure script (provided in the bash_test.tar.bz2 file.  Under  
>>>>the latest snapshot it's about 1 in 900.
>>>>
>>>>When the hang happens it appears that a process is completed, but  still
>>>>can be found in the process directory.  The cmdline file says  
>>>><defunct>, but the process still shows up in the process list (ps -
>>>>ef).  If I try to clean up by killing the process, the kill command  
>>>>says that the process doesn't exist.  The only way that I can make  the
>>>>hung process go away is by using the task manager to kill the  process.
>>>
>>>Your symptoms look familiar to our OOo build hang. I'm curious if in
>>>your case a:
>>>
>>>$ ls /proc/<hangpid>/fd
>>>
>>>also cures the hang. (Sometimes this has to be issued several times.)
>
>>This would make it "not a regression", if so.
>>
>>In fact, I did see some reports that dmake hangs in 1.5.18 in the archives.
>
>Yes, the current snapshots are a lot better than the 1.5.18 release.
>You already fixed all of the problems I could reproduce.  Thanks again,
>this is a lot better than the current release.
>
>But like the last remaining OOo build hang I cannot reproduce this hang
>either.
>
>I tried the script in rxvt and whatever you call it if you use the
>cygwin icon.  It doesn't hang for me.
>
>>I'm wondering if it is YA symptom of:
>>
>>  http://cygwin.com/ml/cygwin/2005-09/msg00923.html
>>
>>since, try as I might, I can't see any way that a process would hang and
>>then become unstuck by performing a "ls /proc/<hangpid>/fd".
>
>*shrug*

"*shrug*"?  Did you look at the message?  There is a proposed workaround
for a windows problem which cygwin is running into.  Maybe following the
instructions in that message would help with the OOo build.

cgf

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Reply | Threaded
Open this post in threaded view
|

Re: Hang with 1.5.18, 1.5.19 snapshot 20051029

Volker Quetschke
Christopher Faylor wrote:

>>>I'm wondering if it is YA symptom of:
>>>
>>> http://cygwin.com/ml/cygwin/2005-09/msg00923.html
>>>
>>>since, try as I might, I can't see any way that a process would hang and
>>>then become unstuck by performing a "ls /proc/<hangpid>/fd".
>>
>>*shrug*
>
> "*shrug*"?  Did you look at the message?  There is a proposed workaround
> for a windows problem which cygwin is running into.  Maybe following the
> instructions in that message would help with the OOo build.
In the "Cygstart fails with MDB file" message? Noone answered to that email.

Volker

--
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D

signature.asc (260 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Hang with 1.5.18, 1.5.19 snapshot 20051029

Christopher Faylor-2
On Tue, Nov 01, 2005 at 10:11:04AM -0500, Volker Quetschke wrote:

>Christopher Faylor wrote:
>>>>I'm wondering if it is YA symptom of:
>>>>
>>>> http://cygwin.com/ml/cygwin/2005-09/msg00923.html
>>>>
>>>>since, try as I might, I can't see any way that a process would hang and
>>>>then become unstuck by performing a "ls /proc/<hangpid>/fd".
>>>
>>>*shrug*
>>
>> "*shrug*"?  Did you look at the message?  There is a proposed workaround
>> for a windows problem which cygwin is running into.  Maybe following the
>> instructions in that message would help with the OOo build.
>
>In the "Cygstart fails with MDB file" message? Noone answered to that email.

Sorry.  Stupid cut/paste error again.  I meant this:

http://cygwin.com/ml/cygwin/2005-09/msg00945.html

cgf

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Reply | Threaded
Open this post in threaded view
|

Re: Hang with 1.5.18, 1.5.19 snapshot 20051029

Volker Quetschke
(snip)
> On Tue, Nov 01, 2005 at 10:11:04AM -0500, Volker Quetschke wrote:
>>In the "Cygstart fails with MDB file" message? Noone answered to that email.
>
> Sorry.  Stupid cut/paste error again.  I meant this:
>
> http://cygwin.com/ml/cygwin/2005-09/msg00945.html

Thanks, I relayed that information. I hope it helps. (*keeping
fingers crossed*)

But in the meantime I got a complete strace including a hang.
Posted to the "other" thread.

Volker

--
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D

signature.asc (260 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Hang with 1.5.18, 1.5.19 snapshot 20051029

Volker Quetschke
Volker Quetschke wrote:

> (snip)
>
>>On Tue, Nov 01, 2005 at 10:11:04AM -0500, Volker Quetschke wrote:
>>
>>>In the "Cygstart fails with MDB file" message? Noone answered to that email.
>>
>>Sorry.  Stupid cut/paste error again.  I meant this:
>>
>>http://cygwin.com/ml/cygwin/2005-09/msg00945.html
>
> Thanks, I relayed that information. I hope it helps. (*keeping
> fingers crossed*)
Just got the answer: No, it didn't help. Still hangs with:

loooong sometime [sig] dmake xxxx talktome: pid <pid> wants some information
 and
looong othertime [sig] tcsh yyyy talktome: pid <pid> wants some information

Volker

>
> But in the meantime I got a complete strace including a hang.
> Posted to the "other" thread.
>
> Volker
>


--
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D

signature.asc (260 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Hang with 1.5.18, 1.5.19 snapshot 20051029

Peter Rehley
In reply to this post by Volker Quetschke

On Nov 1, 2005, at 5:52 AM, Volker Quetschke wrote:

> Hi!
>
> Peter Rehley wrote:
>> I have a problem where a configure script is hanging.  I first  
>> saw  the behavior in 1.5.18, and it's still there in the latest  
>> snapshot.  The only machines that we are seeing it hang on are  
>> windows 2000  machines, sp4, with duel pentinum 933 mhz  
>> processors, and using ssh  to login to the machine.  I haven't  
>> been able to reproduce the  problem on single processor machines  
>> or when ssh is not used.
>> Under 1.5.18, the hang occurred about 1 in ten times in the  
>> test_configure script (provided in the bash_test.tar.bz2 file.  
>> Under  the latest snapshot it's about 1 in 900.
>> When the hang happens it appears that a process is completed, but  
>> still can be found in the process directory.  The cmdline file  
>> says  <defunct>, but the process still shows up in the process  
>> list (ps - ef).  If I try to clean up by killing the process, the  
>> kill command  says that the process doesn't exist.  The only way  
>> that I can make  the hung process go away is by using the task  
>> manager to kill the  process.
>
> Your symptoms look familiar to our OOo build hang. I'm curious if in
> your case a:
>
> $ ls /proc/<hangpid>/fd
>
> also cures the hang. (Sometimes this has to be issued several times.)

This didn't seem to do anything for me.....but I also did not try  
several times,  and now I'm having a hard time reproducing the  
problem. :(

One really odd thing that I did notice on my windows 2000 machines  
was that when I do a 'ps -ef' many times in a row quickly, the  
test_configure script that I'm using dies...it either segfaults or I  
get fork: Resource Temporarily unavailable.

Peter

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Reply | Threaded
Open this post in threaded view
|

"fork: Resource Temporarily Unavailable" (Was Re: Hang with 1.5.18, 1.5.19 snapshot 20051029)

Igor Peshansky
On Wed, 2 Nov 2005, Peter Rehley wrote:

> [snip]
> One really odd thing that I did notice on my windows 2000 machines was
> that when I do a 'ps -ef' many times in a row quickly, the
> test_configure script that I'm using dies...it either segfaults or I get
> fork: Resource Temporarily unavailable.

I've been recently plagued with a "resource temporarily unavailable"
(a.k.a. EBUSY) fork problem (on WinXP), with 1.5.18, snapshots, and
self-build DLLs.  I don't know if it's related to what you're seeing.
The problem doesn't seem to be reproducible under gdb, but can be
reproduced under strace for long-running scripts.  I hereby repeat my plea
to Cygwin developers for debugging suggestions -- what information would
be useful to put into the strace to help (me) track down the source of
this problem?
        Igor
--
                                http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_ [hidden email]
ZZZzz /,`.-'`'    -.  ;-;;,_ [hidden email]
     |,4-  ) )-,_. ,\ (  `'-' Igor Pechtchanski, Ph.D.
    '---''(_/--'  `-'\_) fL a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

If there's any real truth it's that the entire multidimensional infinity
of the Universe is almost certainly being run by a bunch of maniacs. /DA

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/