1.5.18+ resource leak?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

1.5.18+ resource leak?

Brian Ford
I have performed the following operation many times over the past few
months, and every time have eventually hung the machine in some form.
The hang mainfests itself in different ways on different machines and with
different Cygwin DLL versions (1.5.18 and various snapshots).  Here is one
example of the hang:

In an xterm under Xwin -multiwindow on a dual-core dual-processor AMD
Opteron I execute (all machines tested have been dual-core, but not
necessarily dual-processor):

ssh user@solaris_machine 'cd some_directory && tar cf -
some_other_directory' | tar xvf -

These are large copies of several hundred megabytes.  Here was the result:

tar some_file_path: Cannot write: Permission denied
4 [main] -bash 2304 fork_copy: user/cygwin data pass 2 failed,
0x610000..0x627000, done 0, windows pid 3132, Win 32 error 1450

The only other operation running on the machine at the time was an ftp in
another xterm window of a several hundred megabyte file.  It died with
simply a permission denied error.

At this point, the machine is still responsive, but trying to do anything
with any Cygwin process gives a similar error.  I've looked at the various
resources, memory, handles, threads, etc. in the task manager and nothing
looks unusually high.

Does anyone have a guess as to what resource I may be lacking (that
presumably some Cygwin process leaked)?

I've got about 3 tera bytes to move to several machines in preparation for
a trade show, and I've fought this for several previous ones.  I'd really
like to do all I can to find the problem.  I'm just not sure where to
start.

--
Brian Ford
Senior Realtime Software Engineer
VITAL - Visual Simulation Systems
FlightSafety International
the best safety device in any aircraft is a well-trained pilot...

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Reply | Threaded
Open this post in threaded view
|

RE: 1.5.18+ resource leak?

Dave Korn
Brian Ford wrote:

> ssh user@solaris_machine 'cd some_directory && tar cf -
> some_other_directory' | tar xvf -
>
> These are large copies of several hundred megabytes.  Here was the result:
>
> tar some_file_path: Cannot write: Permission denied
> 4 [main] -bash 2304 fork_copy: user/cygwin data pass 2 failed,
> 0x610000..0x627000, done 0, windows pid 3132, Win 32 error 1450
>
> The only other operation running on the machine at the time was an ftp in
> another xterm window of a several hundred megabyte file.  It died with
> simply a permission denied error.
>
> At this point, the machine is still responsive, but trying to do anything
> with any Cygwin process gives a similar error.  I've looked at the various
> resources, memory, handles, threads, etc. in the task manager and nothing
> looks unusually high.
>
> Does anyone have a guess as to what resource I may be lacking (that
> presumably some Cygwin process leaked)?


  PTEs?  Kernel paged pool?

http://support.microsoft.com/kb/q236964/
http://support.microsoft.com/default.aspx?scid=kb;en-us;304101

both suggest using perfmon to monitor the kernel paged-pool size.  Try running
poolmon?  Try editing the HKLM\System\CCS\Control\Session Manager\Memory
Management keys (see 2nd url above) ?

    cheers,
      DaveK
--
Can't think of a witty .sigline today....


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Reply | Threaded
Open this post in threaded view
|

Re: 1.5.18+ resource leak?

Corinna Vinschen-2
In reply to this post by Brian Ford
On Nov  4 08:42, Brian Ford wrote:

> I have performed the following operation many times over the past few
> months, and every time have eventually hung the machine in some form.
> The hang mainfests itself in different ways on different machines and with
> different Cygwin DLL versions (1.5.18 and various snapshots).  Here is one
> example of the hang:
>
> In an xterm under Xwin -multiwindow on a dual-core dual-processor AMD
> Opteron I execute (all machines tested have been dual-core, but not
> necessarily dual-processor):
>
> ssh user@solaris_machine 'cd some_directory && tar cf -
> some_other_directory' | tar xvf -
>
> These are large copies of several hundred megabytes.  Here was the result:
>
> tar some_file_path: Cannot write: Permission denied
> 4 [main] -bash 2304 fork_copy: user/cygwin data pass 2 failed,
> 0x610000..0x627000, done 0, windows pid 3132, Win 32 error 1450

I've just copied 21 Gigs from a Linux box to a WIndows box using the
exact same ssh|tar as you did.  There's nothing pointing to any kind of
memory or handle leak while doing that and the operation terminated
successfully.

Are you by any chance starting the X server from a service or something?
Maybe that's a problem induced by the heap allocation as Chris has
already noted in a few threads.  Er... lemme see...
http://support.microsoft.com/default.aspx?scid=kb;en-us;824422


Corinna

--
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat, Inc.

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Reply | Threaded
Open this post in threaded view
|

Re: 1.5.18+ resource leak?

Brian Ford
On Fri, 4 Nov 2005, Corinna Vinschen wrote:

> On Nov  4 08:42, Brian Ford wrote:
> > ssh user@solaris_machine 'cd some_directory && tar cf -
> > some_other_directory' | tar xvf -
> >
> > These are large copies of several hundred megabytes.  Here was the result:
> >
> > tar some_file_path: Cannot write: Permission denied
> > 4 [main] -bash 2304 fork_copy: user/cygwin data pass 2 failed,
> > 0x610000..0x627000, done 0, windows pid 3132, Win 32 error 1450
>
> I've just copied 21 Gigs from a Linux box to a WIndows box using the
> exact same ssh|tar as you did.  There's nothing pointing to any kind of
> memory or handle leak while doing that and the operation terminated
> successfully.

Yes, I haven't seen anything I could identify either.  I think the key is
that these are mostly small files (a hundred thousand or so).  Copying
larger ones of the same volume doesn't seem to trigger the problem.

> Are you by any chance starting the X server from a service or something?

Nope.  I need to eliminate X as a variable.

> Maybe that's a problem induced by the heap allocation as Chris has
> already noted in a few threads.  Er... lemme see...
> http://support.microsoft.com/default.aspx?scid=kb;en-us;824422

Yes, I'll look into that, but I thought that was only applicable when lots
of processes were running.

--
Brian Ford
Senior Realtime Software Engineer
VITAL - Visual Simulation Systems
FlightSafety International
the best safety device in any aircraft is a well-trained pilot...

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Reply | Threaded
Open this post in threaded view
|

RE: 1.5.18+ resource leak?

Dave Korn
Brian Ford wrote:

>
> Yes, I haven't seen anything I could identify either.  I think the key is
> that these are mostly small files (a hundred thousand or so).  Copying
> larger ones of the same volume doesn't seem to trigger the problem.


  That _does_ seem somewhat like the issue described in the second of those
mskb links I posted earlier...

http://support.microsoft.com/default.aspx?scid=kb;en-us;304101

"  CAUSE
The two causes of this problem are related. The more frequent cause is listed
first:
. More files are open than the memory cache manager can handle. As a
result, the cache manager has exhausted the available paged pool memory.  "
[...snip...]

... see also the "MORE INFORMATION" section at the bottom.


    cheers,
      DaveK
--
Can't think of a witty .sigline today....


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Reply | Threaded
Open this post in threaded view
|

RE: 1.5.18+ resource leak?

Brian Ford
On Fri, 4 Nov 2005, Dave Korn wrote:

> Brian Ford wrote:
> > Yes, I haven't seen anything I could identify either.  I think the key is
> > that these are mostly small files (a hundred thousand or so).  Copying
> > larger ones of the same volume doesn't seem to trigger the problem.
>
>   That _does_ seem somewhat like the issue described in the second of those
> mskb links I posted earlier...
>
> http://support.microsoft.com/default.aspx?scid=kb;en-us;304101
>
> "  CAUSE
> The two causes of this problem are related. The more frequent cause is listed
> first:
> . More files are open than the memory cache manager can handle. As a
> result, the cache manager has exhausted the available paged pool memory.  "
> [...snip...]
>
> ... see also the "MORE INFORMATION" section at the bottom.

Thanks Dave.  I wasn't ignoring you.  I was just waiting until I had some
solid information to reply.  I really do appreciate the help and
suggestions.

I'm not completely sure this is the issue yet, but I believe it is.  I was
able to reproduce the problem without involving Cygwin at all.  An xcopy
from a Samba mounted drive on the Solaris box to the local disk in a
cmd shell also triggered the problem (although it was a much slower copy
with all those network file open/close operations required).

Even though this is now off-topic, I'll follow up just to close the thread
for the archives when I verify that one of those registry settings
actually fixes it.

Thanks again.

--
Brian Ford
Senior Realtime Software Engineer
VITAL - Visual Simulation Systems
FlightSafety International
the best safety device in any aircraft is a well-trained pilot...

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/