[PATCH 0/1] Fix deadlocks related to child processes

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/1] Fix deadlocks related to child processes

David McFarland
For a long time I've been struggling with intermittent deadlocks and
segfaults in emacs, seemingly related to invoking child processes.  I
recently found a reliable way to reproduce one such deadlock:

- install clean cygwin with: emacs-w32, clang
- install flycheck from elpa
- grab some non trivial C header e.g.:
  $ cp /usr/include/stdio.h test.h
- $ emacs -q test.h
- start flycheck:
  (progn (package-initialize)
         (require 'flycheck)
         (flycheck-mode))
- add a character to the start of the first line
- wait for flygheck to complete
- repeat the last two steps until a deadlock occurs

Breaking in gdb showed the main thread in `cygheap_protect.acquire ()`,
from either _cfree or _cmalloc.  The thread holding the mutex was always
"flasio", and it would either be continually segfaulting or looping in
_cfree.

I added some debug prints to cygheap and determined that it flasio was
double-freeing an atomic_write_buf.  I added some more prints and found
that it was two different fhandler objects freeing the same buffer.

I then found that `fhandler_base_overlapped::copyto` would clear the
buffer pointer after the copy, but none of the derived classes (pipe,
fifo) did.

Attached is a patch which clears the buffer pointers when copying pipes
and fifos.

It would probably be safer to move the buffer clear to a `operator=`,
but I wanted to keep the patch as simple as possible and avoid
refactoring.


David McFarland (1):
  Cygwin: Fix cygheap corruption caused by cloned atomic buffer

 winsup/cygwin/fhandler.h | 2 ++
 1 file changed, 2 insertions(+)

--
2.19.1

Reply | Threaded
Open this post in threaded view
|

[PATCH 1/1] Cygwin: Fix cygheap corruption caused by cloned atomic buffer

David McFarland
The fhandler_base_overlapped::copyto clears atomic_write_buf on the
clone, but none of the derived classes were doing this.  This allowed
the destructor to double-free the buffer and corrupt cygheap.
Clear atomic_write_buf in copyto of all derived classes.
---
 winsup/cygwin/fhandler.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/winsup/cygwin/fhandler.h b/winsup/cygwin/fhandler.h
index 2cc99d713..9e63867ab 100644
--- a/winsup/cygwin/fhandler.h
+++ b/winsup/cygwin/fhandler.h
@@ -1216,6 +1216,7 @@ public:
   {
     x->pc.free_strings ();
     *reinterpret_cast<fhandler_pipe *> (x) = *this;
+    reinterpret_cast<fhandler_pipe *> (x)->atomic_write_buf = NULL;
     x->reset (this);
   }
 
@@ -1256,6 +1257,7 @@ public:
   {
     x->pc.free_strings ();
     *reinterpret_cast<fhandler_fifo *> (x) = *this;
+    reinterpret_cast<fhandler_fifo *> (x)->atomic_write_buf = NULL;
     x->reset (this);
   }
 
--
2.19.1

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 0/1] Fix deadlocks related to child processes

Corinna Vinschen-2
In reply to this post by David McFarland
On Oct 28 16:22, David McFarland wrote:

> For a long time I've been struggling with intermittent deadlocks and
> segfaults in emacs, seemingly related to invoking child processes.  I
> recently found a reliable way to reproduce one such deadlock:
>
> - install clean cygwin with: emacs-w32, clang
> - install flycheck from elpa
> - grab some non trivial C header e.g.:
>   $ cp /usr/include/stdio.h test.h
> - $ emacs -q test.h
> - start flycheck:
>   (progn (package-initialize)
>          (require 'flycheck)
>          (flycheck-mode))
> - add a character to the start of the first line
> - wait for flygheck to complete
> - repeat the last two steps until a deadlock occurs
>
> Breaking in gdb showed the main thread in `cygheap_protect.acquire ()`,
> from either _cfree or _cmalloc.  The thread holding the mutex was always
> "flasio", and it would either be continually segfaulting or looping in
> _cfree.
>
> I added some debug prints to cygheap and determined that it flasio was
> double-freeing an atomic_write_buf.  I added some more prints and found
> that it was two different fhandler objects freeing the same buffer.
>
> I then found that `fhandler_base_overlapped::copyto` would clear the
> buffer pointer after the copy, but none of the derived classes (pipe,
> fifo) did.
>
> Attached is a patch which clears the buffer pointers when copying pipes
> and fifos.
>
> It would probably be safer to move the buffer clear to a `operator=`,
> but I wanted to keep the patch as simple as possible and avoid
> refactoring.
Excellent detective work, thanks for the patch!  Pushed.


Corinna

--
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

signature.asc (849 bytes) Download Attachment