[PATCH?] Separate pthread patches, #2 take 2.

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH?] Separate pthread patches, #2 take 2.

Dave Korn-6

  The attached patch implements ilockexch and ilockcmpexch, using the inline
asm definition from __arch_compare_and_exchange_val_32_acq in
glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline rather
than in its original preprocessor macro form.

  It generates incorrect code.  The sequence discussed before,

126  do
127    node->next = head;
128  while (InterlockedCompareExchangePointer (&head, node, node->next) !=
node->next);

is now compiled down to this loop:

L186:
        .loc 3 127 0
        movl __ZN13pthread_mutex7mutexesE+8, %edx # mutexes.head, D.28599
        .loc 2 58 0
        movl %edx, %eax # D.28599, tmp68
/APP
 # 58 "/gnu/winsup/src/winsup/cygwin/winbase.h" 1
        lock cmpxchgl %ebx, __ZN13pthread_mutex7mutexesE+8 # this,
 # 0 "" 2
/NO_APP
        movl %eax, -12(%ebp) # tmp68, ret
        .loc 2 59 0
        movl -12(%ebp), %eax # ret, D.28596
        .loc 3 126 0
        cmpl %eax, %edx # D.28596, D.28599
        jne L186 #,
        movl %edx, 36(%ebx) # D.28599, <variable>.next


... which is in fact the equivalent of

126  do
127    ;
128  while (InterlockedCompareExchangePointer (&head, node, node->next) !=
node->next);
(126)     node->next = head;

  As it caches the values of head in %edx during the spin loop, and only
stores it to node->next after having overwritten *head with node, there is a
short window after the new node is linked to the front of the chain during
which its chain pointer is incorrect.

  Not OK for head?

    cheers,
      DaveK



Index: winsup/cygwin/winbase.h
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/winbase.h,v
retrieving revision 1.14
diff -p -u -r1.14 winbase.h
--- winsup/cygwin/winbase.h 12 Jul 2008 18:09:17 -0000 1.14
+++ winsup/cygwin/winbase.h 3 Jun 2009 22:54:38 -0000
@@ -34,27 +34,31 @@ ilockdecr (volatile long *m)
  ": "=&r" (__res), "=m" (*m): "m" (*m): "cc");
   return __res;
 }
-
-extern __inline__ long
-ilockexch (volatile long *t, long v)
-{
-  register int __res;
-  __asm__ __volatile__ ("\n\
-1: lock cmpxchgl %3,(%1)\n\
- jne 1b\n\
- ": "=a" (__res), "=q" (t): "1" (t), "q" (v), "0" (*t): "cc");
-  return __res;
-}
-
-extern __inline__ long
-ilockcmpexch (volatile long *t, long v, long c)
-{
-  register int __res;
-  __asm__ __volatile__ ("\n\
- lock cmpxchgl %3,(%1)\n\
- ": "=a" (__res), "=q" (t) : "1" (t), "q" (v), "0" (c): "cc");
-  return __res;
-}
+
+extern __inline__ long
+ilockexch (volatile long *t, long v)
+{
+  return ({
+ __typeof (*t) ret;
+ __asm __volatile ("1: lock cmpxchgl %2, %1\n"
+ " jne 1b\n"
+ : "=a" (ret), "=m" (*t)
+ : "r" (v), "m" (*t), "0" (*t));
+ ret;
+ });
+}
+
+extern __inline__ long
+ilockcmpexch (volatile long *t, long v, long c)
+{
+  return ({
+ __typeof (*t) ret;
+ __asm __volatile ("lock cmpxchgl %2, %1"
+ : "=a" (ret), "=m" (*t)
+ : "r" (v), "m" (*t), "0" (c));
+ ret;
+ });
+}
 
 #undef InterlockedIncrement
 #define InterlockedIncrement ilockincr
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH?] Separate pthread patches, #2 take 3

Dave Korn-6
Dave Korn wrote:
>   The attached patch implements ilockexch and ilockcmpexch, using the inline
> asm definition from __arch_compare_and_exchange_val_32_acq in
> glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline rather
> than in its original preprocessor macro form.

  The attached patch does likewise, but adds a "memory" clobber.  It generates
correct code:

L186:
        .loc 3 127 0
        movl __ZN13pthread_mutex7mutexesE+8, %eax # mutexes.head, D.28599
        movl %eax, 36(%ebx) # D.28599, <variable>.next
        .loc 2 60 0
/APP
 # 60 "/gnu/winsup/src/winsup/cygwin/winbase.h" 1
        lock cmpxchgl %ebx, __ZN13pthread_mutex7mutexesE+8 # this,
 # 0 "" 2
/NO_APP
        movl %eax, -12(%ebp) # tmp68, ret
        .loc 2 61 0
        movl -12(%ebp), %eax # ret, D.28596
        .loc 3 126 0
        cmpl %eax, 36(%ebx) # D.28596, <variable>.next
        jne L186 #,


although as you see it has some needless register motion as it stores %eax to
the stack slot for ret and reloads it.  Still, this is now almost as good as
the code generated by my original patch.

winsup/cygwin/ChangeLog

        * winbase.h (ilockexch):  Fix asm constraints.
        (ilockcmpexch):  Likewise.


  Ok-ish?

    cheers,
      DaveK


Index: winsup/cygwin/winbase.h
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/winbase.h,v
retrieving revision 1.14
diff -p -u -r1.14 winbase.h
--- winsup/cygwin/winbase.h 12 Jul 2008 18:09:17 -0000 1.14
+++ winsup/cygwin/winbase.h 3 Jun 2009 23:28:02 -0000
@@ -38,22 +38,28 @@ ilockdecr (volatile long *m)
 extern __inline__ long
 ilockexch (volatile long *t, long v)
 {
-  register int __res;
-  __asm__ __volatile__ ("\n\
-1: lock cmpxchgl %3,(%1)\n\
- jne 1b\n\
- ": "=a" (__res), "=q" (t): "1" (t), "q" (v), "0" (*t): "cc");
-  return __res;
+  return ({
+ __typeof (*t) ret;
+ __asm __volatile ("1: lock cmpxchgl %2, %1\n"
+ " jne 1b\n"
+ : "=a" (ret), "=m" (*t)
+ : "r" (v), "m" (*t), "0" (*t)
+ : "memory");
+ ret;
+ });
 }
 
 extern __inline__ long
 ilockcmpexch (volatile long *t, long v, long c)
 {
-  register int __res;
-  __asm__ __volatile__ ("\n\
- lock cmpxchgl %3,(%1)\n\
- ": "=a" (__res), "=q" (t) : "1" (t), "q" (v), "0" (c): "cc");
-  return __res;
+  return ({
+ __typeof (*t) ret;
+ __asm __volatile ("lock cmpxchgl %2, %1"
+ : "=a" (ret), "=m" (*t)
+ : "r" (v), "m" (*t), "0" (c)
+ : "memory");
+ ret;
+ });
 }
 
 #undef InterlockedIncrement
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH?] Separate pthread patches, #2 take 1 redux

Dave Korn-6
Dave Korn wrote:
> Dave Korn wrote:
>>   The attached patch implements ilockexch and ilockcmpexch, using the inline
>> asm definition from __arch_compare_and_exchange_val_32_acq in
>> glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline rather
>> than in its original preprocessor macro form.

  And this one, just to have the full set in the same place, is the version
that I originally suggested.  It generates correct and efficient code:

L215:
        .loc 3 127 0
        movl __ZN13pthread_mutex7mutexesE+8, %eax # mutexes.head, D.28638
        movl %eax, 36(%ebx) # D.28638, <variable>.next
        .loc 2 53 0
/APP
 # 53 "/gnu/winsup/src/winsup/cygwin/winbase.h" 1
       
        lock cmpxchgl %ebx,__ZN13pthread_mutex7mutexesE+8 # this,
       
 # 0 "" 2
/NO_APP
        .loc 3 126 0
        cmpl %eax, 36(%ebx) # D.28635, <variable>.next
        jne L215 #,

but is more risky.  No ChangeLog because it's not going to be approved, I'm
posting it just for completeness and future reference.

    cheers,
      DaveK

Index: winsup/cygwin/winbase.h
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/winbase.h,v
retrieving revision 1.14
diff -p -u -r1.14 winbase.h
--- winsup/cygwin/winbase.h 12 Jul 2008 18:09:17 -0000 1.14
+++ winsup/cygwin/winbase.h 3 Jun 2009 17:38:06 -0000
@@ -38,21 +38,21 @@ ilockdecr (volatile long *m)
 extern __inline__ long
 ilockexch (volatile long *t, long v)
 {
-  register int __res;
+  register long __res __asm__ ("%eax") = *t;
   __asm__ __volatile__ ("\n\
-1: lock cmpxchgl %3,(%1)\n\
+1: lock cmpxchgl %2,%1\n\
  jne 1b\n\
- ": "=a" (__res), "=q" (t): "1" (t), "q" (v), "0" (*t): "cc");
+ ": "+a" (__res), "=m" (*t): "q" (v), "m" (*t) : "memory", "cc");
   return __res;
 }
 
 extern __inline__ long
 ilockcmpexch (volatile long *t, long v, long c)
 {
-  register int __res;
+  register long __res __asm ("%eax") = c;
   __asm__ __volatile__ ("\n\
- lock cmpxchgl %3,(%1)\n\
- ": "=a" (__res), "=q" (t) : "1" (t), "q" (v), "0" (c): "cc");
+ lock cmpxchgl %2,%1\n\
+ ": "+a" (__res), "=m" (*t) : "q" (v), "m" (*t) : "memory", "cc");
   return __res;
 }
 
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH?] Separate pthread patches, #2 take 2.

Dave Korn-6
In reply to this post by Dave Korn-6
Dave Korn wrote:
>   The attached patch implements ilockexch and ilockcmpexch, using the inline
> asm definition from __arch_compare_and_exchange_val_32_acq in
> glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline rather
> than in its original preprocessor macro form.
>
>   It generates incorrect code.

  This much looks like it's probably a compiler bug.  This version, compiled
with current GCC HEAD, generates the same results as in the take 3 version,
without needing the explicit memory clobber added:

L469:
        .loc 2 127 0
        movl __ZN13pthread_mutex7mutexesE+8, %eax # mutexes.head, D.30413
        movl %eax, 36(%ebx) # D.30413, <variable>.next
        .loc 5 58 0
/APP
 # 58 "/gnu/winsup/src/winsup/cygwin/winbase.h" 1
        lock cmpxchgl %ebx, __ZN13pthread_mutex7mutexesE+8 # this,
 # 0 "" 2
/NO_APP
        movl %eax, -12(%ebp) # tmp76, ret
        .loc 5 59 0
        movl -12(%ebp), %eax # ret, D.30414
        .loc 2 126 0
        cmpl %eax, 36(%ebx) # D.30414, <variable>.next
        jne L469 #,

... right down to the unoptimised register motion.  This is what we would have
hoped to see: the "memory" clobber ought to be superfluous, and the
write-output operand in *t should have told GCC not to move the store to
node->next after the loop.

  I checked 4.3.3; it behaves the same as 4.3.2, i.e. it incorrectly lowers
the store without a memory clobber present.

    cheers,
      DaveK
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH?] Separate pthread patches, #2 take 2.

Dave Korn-6
Dave Korn wrote:
> Dave Korn wrote:
>>   The attached patch implements ilockexch and ilockcmpexch, using the inline
>> asm definition from __arch_compare_and_exchange_val_32_acq in
>> glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline rather
>> than in its original preprocessor macro form.
>>
>>   It generates incorrect code.
>
>   This much looks like it's probably a compiler bug.  

  Let's see whether anyone else agrees:

        http://gcc.gnu.org/ml/gcc/2009-06/msg00053.html

    cheers,
      DaveK
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH?] Separate pthread patches, #2 take 3

Christopher Faylor-8
In reply to this post by Dave Korn-6
On Thu, Jun 04, 2009 at 12:47:48AM +0100, Dave Korn wrote:

>Dave Korn wrote:
>>   The attached patch implements ilockexch and ilockcmpexch, using the inline
>> asm definition from __arch_compare_and_exchange_val_32_acq in
>> glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline rather
>> than in its original preprocessor macro form.
>
>  The attached patch does likewise, but adds a "memory" clobber.  It generates
>correct code:
>
>L186:
> .loc 3 127 0
> movl __ZN13pthread_mutex7mutexesE+8, %eax # mutexes.head, D.28599
> movl %eax, 36(%ebx) # D.28599, <variable>.next
> .loc 2 60 0
>/APP
> # 60 "/gnu/winsup/src/winsup/cygwin/winbase.h" 1
> lock cmpxchgl %ebx, __ZN13pthread_mutex7mutexesE+8 # this,
> # 0 "" 2
>/NO_APP
> movl %eax, -12(%ebp) # tmp68, ret
> .loc 2 61 0
> movl -12(%ebp), %eax # ret, D.28596
> .loc 3 126 0
> cmpl %eax, 36(%ebx) # D.28596, <variable>.next
> jne L186 #,
>
>
>although as you see it has some needless register motion as it stores %eax to
>the stack slot for ret and reloads it.  Still, this is now almost as good as
>the code generated by my original patch.
>
>winsup/cygwin/ChangeLog
>
> * winbase.h (ilockexch):  Fix asm constraints.
> (ilockcmpexch):  Likewise.
>
>
>  Ok-ish?

I thought that, given your last message to cygwin-developers, you were
going to go off and figure out the best of four implementations.  Is this
the result of that investigation?

cgf
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH?] Separate pthread patches, #2 take 3

Dave Korn-6
Christopher Faylor wrote:

> I thought that, given your last message to cygwin-developers, you were
> going to go off and figure out the best of four implementations.  Is this
> the result of that investigation?

  Once I discarded the ones that weren't quite right because they included a
setz/sete instruction (bsd and boehm), I was left with the ones in the
attached testcase.  The only version I hadn't tested yet is the linux-derived one:

struct __xchg_dummy { unsigned long a[100]; };
#define __xg(x) ((struct __xchg_dummy *)(x))
#define LOCK_PREFIX " lock "
static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
                                      unsigned long newv)
{
        unsigned long prev;
                __asm__ __volatile__(LOCK_PREFIX "cmpxchgl %1,%2"
                                     : "=a"(prev)
                                     : "q"(newv), "m"(*__xg(ptr)), "0"(old)
                                     : "memory");
                return prev;
}
extern __inline__ long
ilockcmpexch (volatile long *t, long v, long c)
{
  return __cmpxchg (t, c, v);
}

  This actually produces the same assembly as my version:

L14:
        movl __ZN13pthread_mutex7mutexesE, %eax # mutexes.head, D.2029
        movl %eax, 36(%ebx) # D.2029, <variable>.next
/APP
 # 75 "mxfull.cpp" 1
         lock cmpxchgl %ebx,__ZN13pthread_mutex7mutexesE # this,
 # 0 "" 2
/NO_APP
        cmpl %eax, 36(%ebx) # prev, <variable>.next
        jne L14 #,


but it's a horrible bit of code.  Declaring the memory location as input only,
then clobbering all of memory and potentially confusing the optimisers with
type aliasing casts?  It makes me very uneasy.

    cheers,
      DaveK




// g++ -c mxfull.cpp -o mxfull.o --save-temps -O2 -fverbose-asm

typedef long LONG;
typedef void *HANDLE;
typedef void *PVOID;
typedef char *LPCSTR;

typedef class pthread_mutex *pthread_mutex_t;
typedef class pthread_mutexattr *pthread_mutexattr_t;
typedef class pthread *pthread_t;

struct SECURITY_ATTRIBUTES;
typedef struct SECURITY_ATTRIBUTES *LPSECURITY_ATTRIBUTES;
extern struct SECURITY_ATTRIBUTES sec_none_nih;

HANDLE __attribute__((__stdcall__)) CreateSemaphoreA(LPSECURITY_ATTRIBUTES,LONG,LONG,LPCSTR);

class verifyable_object
{
public:
  long magic;

  verifyable_object (long);
  virtual ~verifyable_object ();
};

#if 0
// My version
 extern __inline__ long
 ilockcmpexch (volatile long *t, long v, long c)
 {
  register long __res __asm ("%eax") = c;
   __asm__ __volatile__ ("\n\
        lock cmpxchgl %2,%1\n\
        ": "+a" (__res), "=m" (*t) : "q" (v), "m" (*t) : "memory", "cc");
   return __res;
 }
#elif 0
// Original version
 extern __inline__ long
 ilockcmpexch (volatile long *t, long v, long c)
 {
  register int __res;
   __asm__ __volatile__ ("\n\
        lock cmpxchgl %3,(%1)\n\
        ": "=a" (__res), "=q" (t) : "1" (t), "q" (v), "0" (c): "cc");
   return __res;
 }
#elif 0
// GlibC/uClibC version
extern __inline__ long
ilockcmpexch (volatile long *t, long v, long c)
{
  return ({
                __typeof (*t) ret;
                __asm __volatile ("lock cmpxchgl %2, %1"
                        : "=a" (ret), "=m" (*t)
                        : "r" (v), "m" (*t), "0" (c));
                ret;
        });
}
#elif 01
// Linux-2.6.8-1 version
struct __xchg_dummy { unsigned long a[100]; };
#define __xg(x) ((struct __xchg_dummy *)(x))
#define LOCK_PREFIX " lock "
static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
                                      unsigned long newv)
{
        unsigned long prev;
                __asm__ __volatile__(LOCK_PREFIX "cmpxchgl %1,%2"
                                     : "=a"(prev)
                                     : "q"(newv), "m"(*__xg(ptr)), "0"(old)
                                     : "memory");
                return prev;
}
extern __inline__ long
ilockcmpexch (volatile long *t, long v, long c)
{
  return __cmpxchg (t, c, v);
}
#endif

class pthread_mutexattr: public verifyable_object
{
public:
  int pshared;
  int mutextype;
  pthread_mutexattr ();
  ~pthread_mutexattr ();
};

template <class list_node> inline void
List_insert (list_node *&head, list_node *node)
{
  if (!node)
    return;
  do
    node->next = head;
  while ((PVOID)ilockcmpexch((LONG volatile *)(&head),(LONG)(node),(LONG)(node->next)) != node->next);
}

template <class list_node> class List
{
 public:
  List() : head(__null)
  {
  }

  ~List()
  {
  }

  void insert (list_node *node)
  {
    List_insert (head, node);
  }

  list_node *head;

};

class pthread_mutex: public verifyable_object
{
public:

  unsigned long lock_counter;
  HANDLE win32_obj_id;
  unsigned int recursion_counter;
  LONG condwaits;
  pthread_t owner;



  int type;
  int pshared;


  pthread_mutex (pthread_mutexattr * = __null);
  ~pthread_mutex ();

  class pthread_mutex * next;

private:
  static List<pthread_mutex> mutexes;
};

List<pthread_mutex> pthread_mutex::mutexes;

pthread_mutex::pthread_mutex (pthread_mutexattr *attr) :
  verifyable_object (0xdf0df045 +1),
  lock_counter (0),
  win32_obj_id (__null), recursion_counter (0),
  condwaits (0), owner (__null),



  type (1),
  pshared (0)
{
  win32_obj_id = ::CreateSemaphoreA (&sec_none_nih, 0, 2147483647L, __null);
  if (!win32_obj_id)
    {
      magic = 0;
      return;
    }

  if (attr)
    {
      if (attr->pshared == 1)
 {

   magic = 0;
   return;
 }

      type = attr->mutextype;
    }

  mutexes.insert (this);
}

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH?] Separate pthread patches, #2 take 3

Christopher Faylor-8
On Thu, Jun 04, 2009 at 04:03:03AM +0100, Dave Korn wrote:
>but it's a horrible bit of code.  Declaring the memory location as input only,
>then clobbering all of memory and potentially confusing the optimisers with
>type aliasing casts?  It makes me very uneasy.

Ok.  I'm convinced.  Please check in whatever you think is best.

cgf
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH?] Separate pthread patches, #2 take 3

Dave Korn-6
Christopher Faylor wrote:
> On Thu, Jun 04, 2009 at 04:03:03AM +0100, Dave Korn wrote:
>> but it's a horrible bit of code.  Declaring the memory location as input only,
>> then clobbering all of memory and potentially confusing the optimisers with
>> type aliasing casts?  It makes me very uneasy.
>
> Ok.  I'm convinced.  Please check in whatever you think is best.
>
> cgf

  I will wait and see what advice the gcc list has to offer on the topic
first.  It may yet cast a new light on things.  (Or it may just confirm my
suspicions that even in trusted and well-tested library code, there has been a
fair deal of ad-hoc-ery and copypastaing of inline asms that people didn't
really grok.)  I'll also see if I can dig up any recent PRs or fixes that
might have a bearing on why we get better code from HEAD than 4.3.

    cheers,
      DaveK
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH?] Separate pthread patches, #2 take 2.

Corinna Vinschen-2
In reply to this post by Dave Korn-6
On Jun  4 02:52, Dave Korn wrote:

> Dave Korn wrote:
> > Dave Korn wrote:
> >>   The attached patch implements ilockexch and ilockcmpexch, using the inline
> >> asm definition from __arch_compare_and_exchange_val_32_acq in
> >> glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline rather
> >> than in its original preprocessor macro form.
> >>
> >>   It generates incorrect code.
> >
> >   This much looks like it's probably a compiler bug.  
>
>   Let's see whether anyone else agrees:
>
>         http://gcc.gnu.org/ml/gcc/2009-06/msg00053.html

When you checked in this change, I'll create a 1.7.0-49 test release.


Corinna

--
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH?] Separate pthread patches, #2 take 2.

Dave Korn-6
Corinna Vinschen wrote:

> On Jun  4 02:52, Dave Korn wrote:
>> Dave Korn wrote:
>>> Dave Korn wrote:
>>>>   The attached patch implements ilockexch and ilockcmpexch, using the inline
>>>> asm definition from __arch_compare_and_exchange_val_32_acq in
>>>> glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline rather
>>>> than in its original preprocessor macro form.
>>>>
>>>>   It generates incorrect code.
>>>   This much looks like it's probably a compiler bug.  
>>   Let's see whether anyone else agrees:
>>
>>         http://gcc.gnu.org/ml/gcc/2009-06/msg00053.html
>
> When you checked in this change, I'll create a 1.7.0-49 test release.

  Am off to a meeting now; won't get a chance until late tonight.

    cheers,
      DaveK
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH?] Separate pthread patches, #2 take 2.

Dave Korn-6
In reply to this post by Corinna Vinschen-2
Corinna Vinschen wrote:

> On Jun  4 02:52, Dave Korn wrote:
>> Dave Korn wrote:
>>> Dave Korn wrote:
>>>>   The attached patch implements ilockexch and ilockcmpexch, using the inline
>>>> asm definition from __arch_compare_and_exchange_val_32_acq in
>>>> glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline rather
>>>> than in its original preprocessor macro form.
>>>>
>>>>   It generates incorrect code.
>>>   This much looks like it's probably a compiler bug.  
>>   Let's see whether anyone else agrees:
>>
>>         http://gcc.gnu.org/ml/gcc/2009-06/msg00053.html
>
> When you checked in this change, I'll create a 1.7.0-49 test release.
  This is the final version I committed.  It is the glibc version, with the
addition of the memory clobber, which that discussion revealed was absolutely
required, and the use of a register asm var to feed the inline asm, which is
in accordance with documented practice in the gcc manual.  (This now leaves
only one difference between the glibc version and the version I posted, which
is the use of a "+a" write-only output constraint paired with a numeric "0"
matching input constraint in glibc's version compared with a single output
operand using the read-write constraing "=a" in my version.  These should in
fact be exactly identical in terms of what they indicate to reload, in any case.)

  I have also manually inspected the generated assembly from thread.cc and
shared.cc in a cygwin DLL build and verified that it is correct and efficient,
and have installed the resulting DLL and retested all Thomas Stalder's
testcases and the previously intermittently failing pthread7-rope testcase
from libstdc++ testsuite.  Committed with this ChangeLog:

winsup/cygwin/ChangeLog

        * winbase.h (ilockexch):  Fix asm constraints.
        (ilockcmpexch):  Likewise.


  Libstdc++ plan after the weekend.  Cheers all!

    cheers,
      DaveK



? winsup/cygwin/cygwin-cxx.h
? winsup/cygwin/mutex
Index: winsup/cygwin/winbase.h
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/winbase.h,v
retrieving revision 1.14
diff -p -u -r1.14 winbase.h
--- winsup/cygwin/winbase.h 12 Jul 2008 18:09:17 -0000 1.14
+++ winsup/cygwin/winbase.h 5 Jun 2009 13:05:08 -0000
@@ -38,22 +38,31 @@ ilockdecr (volatile long *m)
 extern __inline__ long
 ilockexch (volatile long *t, long v)
 {
-  register int __res;
-  __asm__ __volatile__ ("\n\
-1: lock cmpxchgl %3,(%1)\n\
- jne 1b\n\
- ": "=a" (__res), "=q" (t): "1" (t), "q" (v), "0" (*t): "cc");
-  return __res;
+  return
+  ({
+    register __typeof (*t) ret __asm ("%eax");
+    __asm __volatile ("\n"
+ "1: lock cmpxchgl %2, %1\n"
+ " jne  1b\n"
+ : "=a" (ret), "=m" (*t)
+ : "r" (v), "m" (*t), "0" (*t)
+ : "memory");
+    ret;
+  });
 }
 
 extern __inline__ long
 ilockcmpexch (volatile long *t, long v, long c)
 {
-  register int __res;
-  __asm__ __volatile__ ("\n\
- lock cmpxchgl %3,(%1)\n\
- ": "=a" (__res), "=q" (t) : "1" (t), "q" (v), "0" (c): "cc");
-  return __res;
+  return
+  ({
+    register __typeof (*t) ret __asm ("%eax");
+    __asm __volatile ("lock cmpxchgl %2, %1"
+ : "=a" (ret), "=m" (*t)
+ : "r" (v), "m" (*t), "0" (c)
+ : "memory");
+    ret;
+  });
 }
 
 #undef InterlockedIncrement
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH?] Separate pthread patches, #2 take 2.

Corinna Vinschen-2
On Jun  5 15:04, Dave Korn wrote:

> Corinna Vinschen wrote:
> > On Jun  4 02:52, Dave Korn wrote:
> >> Dave Korn wrote:
> >>> Dave Korn wrote:
> >>>>   The attached patch implements ilockexch and ilockcmpexch, using the inline
> >>>> asm definition from __arch_compare_and_exchange_val_32_acq in
> >>>> glibc-2.10.1/sysdeps/i386/i486/bits/atomic.h, trivially expanded inline rather
> >>>> than in its original preprocessor macro form.
> >>>>
> >>>>   It generates incorrect code.
> >>>   This much looks like it's probably a compiler bug.  
> >>   Let's see whether anyone else agrees:
> >>
> >>         http://gcc.gnu.org/ml/gcc/2009-06/msg00053.html
> >
> > When you checked in this change, I'll create a 1.7.0-49 test release.
>
>   This is the final version I committed.  It is the glibc version, with the
> addition of the memory clobber, which that discussion revealed was absolutely
> required, and the use of a register asm var to feed the inline asm, which is
> in accordance with documented practice in the gcc manual.  (This now leaves
> only one difference between the glibc version and the version I posted, which
> is the use of a "+a" write-only output constraint paired with a numeric "0"
> matching input constraint in glibc's version compared with a single output
> operand using the read-write constraing "=a" in my version.  These should in
> fact be exactly identical in terms of what they indicate to reload, in any case.)
>
>   I have also manually inspected the generated assembly from thread.cc and
> shared.cc in a cygwin DLL build and verified that it is correct and efficient,
> and have installed the resulting DLL and retested all Thomas Stalder's
> testcases and the previously intermittently failing pthread7-rope testcase
> from libstdc++ testsuite.  Committed with this ChangeLog:

Cool, thank you.


Corinna

--
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH?] Separate pthread patches, #2 take 2.

Christopher Faylor-8
In reply to this post by Dave Korn-6
On Fri, Jun 05, 2009 at 03:04:59PM +0100, Dave Korn wrote:
>winsup/cygwin/ChangeLog
>
> * winbase.h (ilockexch):  Fix asm constraints.
> (ilockcmpexch):  Likewise.

Thanks for seeing this through.  It was obviously a lot of work.

cgf
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH?] Separate pthread patches, #2 take 2.

Dave Korn-6
Christopher Faylor wrote:
> On Fri, Jun 05, 2009 at 03:04:59PM +0100, Dave Korn wrote:
>> winsup/cygwin/ChangeLog
>>
>> * winbase.h (ilockexch):  Fix asm constraints.
>> (ilockcmpexch):  Likewise.
>
> Thanks for seeing this through.  It was obviously a lot of work.
>
> cgf

  I appreciate the need to be diligent when working so deep in the bowels of
the fundaments, so no problem :)

    cheers,
      DaveK