Quantcast

Limited Internet speeds caused by inappropriate socket buffering in function fdsock (winsup/net.cc)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Limited Internet speeds caused by inappropriate socket buffering in function fdsock (winsup/net.cc)

Daniel Havey
At Windows we love what you are doing with Cygwin.  However, we have
been getting reports from our hardware vendors that iperf is slow on
Windows.  Iperf is of course compiled against the cygwin1.dll and we
believe we have traced the problem down to the function fdsock in
net.cc.  SO_RCVBUF and SO_SNDBUF are being manually set.  The comments
indicate that the idea was to increase the buffer size, but, this code
must have been written long ago because Windows has used autotuning
for a very long time now.  Please do not manually set SO_RCVBUF or
SO_SNDBUF as this will limit your internet speed.

I am providing a patch, an STC and my cygcheck -svr output.  Hope we
can fix this.  Please let me know if I can help further.

Simple Test Case:
I have a script that pings 4 times and then iperfs for 10 seconds to
debit.k-net.fr


With patch
$ bash buffer_test.sh 178.250.209.22
usage: bash buffer_test.sh <iperf server name>

Pinging 178.250.209.22 with 32 bytes of data:
Reply from 178.250.209.22: bytes=32 time=167ms TTL=34
Reply from 178.250.209.22: bytes=32 time=173ms TTL=34
Reply from 178.250.209.22: bytes=32 time=173ms TTL=34
Reply from 178.250.209.22: bytes=32 time=169ms TTL=34

Ping statistics for 178.250.209.22:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 167ms, Maximum = 173ms, Average = 170ms
------------------------------------------------------------
Client connecting to 178.250.209.22, TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[  3] local 10.137.196.108 port 58512 connected with 178.250.209.22 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   768 KBytes  6.29 Mbits/sec
[  3]  1.0- 2.0 sec  9.25 MBytes  77.6 Mbits/sec
[  3]  2.0- 3.0 sec  18.0 MBytes   151 Mbits/sec
[  3]  3.0- 4.0 sec  18.0 MBytes   151 Mbits/sec
[  3]  4.0- 5.0 sec  18.0 MBytes   151 Mbits/sec
[  3]  5.0- 6.0 sec  18.0 MBytes   151 Mbits/sec
[  3]  6.0- 7.0 sec  18.0 MBytes   151 Mbits/sec
[  3]  7.0- 8.0 sec  18.0 MBytes   151 Mbits/sec
[  3]  8.0- 9.0 sec  18.0 MBytes   151 Mbits/sec
[  3]  9.0-10.0 sec  18.0 MBytes   151 Mbits/sec
[  3]  0.0-10.0 sec   154 MBytes   129 Mbits/sec


Without patch:
dahavey@DMH-DESKTOP ~
$ bash buffer_test.sh 178.250.209.22

Pinging 178.250.209.22 with 32 bytes of data:
Reply from 178.250.209.22: bytes=32 time=168ms TTL=34
Reply from 178.250.209.22: bytes=32 time=167ms TTL=34
Reply from 178.250.209.22: bytes=32 time=170ms TTL=34
Reply from 178.250.209.22: bytes=32 time=169ms TTL=34

Ping statistics for 178.250.209.22:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 167ms, Maximum = 170ms, Average = 168ms
------------------------------------------------------------
Client connecting to 178.250.209.22, TCP port 5001
TCP window size:  208 KByte (default)
------------------------------------------------------------
[  3] local 10.137.196.108 port 58443 connected with 178.250.209.22 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   512 KBytes  4.19 Mbits/sec
[  3]  1.0- 2.0 sec  1.50 MBytes  12.6 Mbits/sec
[  3]  2.0- 3.0 sec  1.50 MBytes  12.6 Mbits/sec
[  3]  3.0- 4.0 sec  1.50 MBytes  12.6 Mbits/sec
[  3]  4.0- 5.0 sec  1.50 MBytes  12.6 Mbits/sec
[  3]  5.0- 6.0 sec  1.50 MBytes  12.6 Mbits/sec
[  3]  6.0- 7.0 sec  1.50 MBytes  12.6 Mbits/sec
[  3]  7.0- 8.0 sec  1.50 MBytes  12.6 Mbits/sec
[  3]  8.0- 9.0 sec  1.50 MBytes  12.6 Mbits/sec
[  3]  9.0-10.0 sec  1.50 MBytes  12.6 Mbits/sec
[  3]  0.0-10.1 sec  14.1 MBytes  11.7 Mbits/sec


The output shows that the RTT from my machine to the iperf server is
similar in both cases (about 170ms) however with the patch the
throughput averages 129 Mbps while without the patch the throughput
only averages 11.7 Mbps.  If we calculate the maximum throughput using
Bandwidth = Queue/RTT we get (212992 * 8)/0.170 = 10.0231 Mbps.  This
is just about what iperf is showing us without the patch since the
buffer size is set to 212992 I believe that the buffer size is
limiting the throughput.  With the patch we have no buffer limitation
(autotuning) and can develop the full potential bandwidth on the link.

If you want to duplicate the STC you will have to find an iperf server
(I found an extreme case) that has a large enough RTT distance from
you and try a few times.  I get varying results depending on Internet
traffic but without the patch never exceed the limit caused by the
buffering.

Here is the patch:
--- net.cc_orig 2017-01-09 09:37:54.301210600 -0800
+++ net.cc 2017-01-09 14:15:57.998895500 -0800
@@ -517,7 +517,7 @@
 bool
 fdsock (cygheap_fdmanip& fd, const device *dev, SOCKET soc)
 {
-  int size;
+//  int size;

   fd = build_fh_dev (*dev);
   if (!fd.isopen ())
@@ -584,6 +584,7 @@
   fd->set_flags (O_RDWR | O_BINARY);
   debug_printf ("fd %d, name '%s', soc %p", (int) fd, dev->name (), soc);

+
   /* Raise default buffer sizes (instead of WinSock default 8K).

      64K appear to have the best size/performance ratio for a default
@@ -608,6 +609,8 @@
      of 1k, but since 64k breaks WSADuplicateSocket we use 63Kb.

      (*) Maximum normal TCP window size.  Coincidence?  */
+
+
 #ifdef __x86_64__
   ((fhandler_socket *) fd)->rmem () = 212992;
   ((fhandler_socket *) fd)->wmem () = 212992;
@@ -615,6 +618,14 @@
   ((fhandler_socket *) fd)->rmem () = 64512;
   ((fhandler_socket *) fd)->wmem () = 64512;
 #endif
+
+/*   Please don't do this.  Windows doesn't have a default buffer of
8K it uses autotuning.
+     The thing about network buffers is they have to be chosen
dynamically.  Both Windows
+     and Linux do this.  However, this code sets the SO_SNDBUF and
SO_RCVBUF size statically.
+     This will limit Internet speeds and cause bufferbloat.  Let the
OS dynamically choose
+     the SO_SNDBUF and SO_RCVBUF sizes.
+
+
   if (::setsockopt (soc, SOL_SOCKET, SO_RCVBUF,
     (char *) &((fhandler_socket *) fd)->rmem (), sizeof (int)))
     {
@@ -632,7 +643,9 @@
  (char *) &((fhandler_socket *) fd)->wmem (),
  (size = sizeof (int), &size)))
  system_printf ("getsockopt(SO_SNDBUF) failed, %u", WSAGetLastError ());
-    }
+    } */
+
+

   /* A unique ID is necessary to recognize fhandler entries which are
      duplicated by dup(2) or fork(2).  This is used in BSD flock calls

cygcheck.out (317K) Download Attachment
net.cc.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Limited Internet speeds caused by inappropriate socket buffering in function fdsock (winsup/net.cc)

Corinna Vinschen-2
Hi Daniel,

On Jan  9 15:49, Daniel Havey wrote:
> At Windows we love what you are doing with Cygwin.  However, we have

Windows?

> been getting reports from our hardware vendors that iperf is slow on
> Windows.  Iperf is of course compiled against the cygwin1.dll and we
> believe we have traced the problem down to the function fdsock in
> net.cc.  SO_RCVBUF and SO_SNDBUF are being manually set.  The comments
> indicate that the idea was to increase the buffer size, but, this code
> must have been written long ago because Windows has used autotuning
> for a very long time now.  Please do not manually set SO_RCVBUF or
> SO_SNDBUF as this will limit your internet speed.

Yes, the code has quite a history.

> I am providing a patch, an STC and my cygcheck -svr output.  Hope we
> can fix this.  Please let me know if I can help further.

Yes, perhaps.  Your patch disables setting SO_SNDBUF/SO_RCVBUF, but it
keeps the values for wmem/rmem intact.

rmem is basically unused, but wmem is used in fhandler_socket::send_internal
to account for KB 823764 and another weird behaviour we observed long ago:

If an application sends data in chunks > SO_SNDBUF, the OS doesn't just
fill up the send buffer, but instead it will allocate a temporary buffer
big enough to copy over the application buffer.  So if the application
sends data inb 2 Meg chunks, every send will allocate another 2 Megs and
copy the data, which wastes time and space.

As far as I remember, this is still a problem in Vista and later.  Can
you confirm or deny this by any chance?

And, do we have something like an ideal splitting point?  Given that
SO_SNDBUF/SO_RCVBUF isn't set anymore, we could set wmem to some
arbitrary, yet useful value on both targets, 32 and 64 bit.  I think,
keeping wmem < 64K on 32 bit should be better,

> If you want to duplicate the STC you will have to find an iperf server
> (I found an extreme case) that has a large enough RTT distance from
> you and try a few times.  I get varying results depending on Internet
> traffic but without the patch never exceed the limit caused by the
> buffering.

I can confirm the observation.  I have an iperf server with an avg
RTT of 155ms, and without your patch I hit the upper ceiling at
10.4 Mbit/s, while with your patch I get up to 23 Mbit/s.

> Here is the patch:

As for your patch, a few minor points:

- Can you please send a BSD sign-off per https://cygwin.com/contrib.html?
  For the text see
  https://cygwin.com/git/?p=newlib-cygwin.git;a=blob;f=winsup/CONTRIBUTORS

- Please keep the line length <= 80 chars and remove unnecessary changes
  (e. g. adding empty lines).

- The now unused code should be put into `#if 0' bracketing, rather than
  in a comment. Move the `int size;' declaration down so it will be inside
  the same `#if 0' bracket.

- The preceeding comment shows that the code has a lot of history.  I
  think it would be prudent to add your comment as NOTE 4 into the same
  comment, so the history is in one place.

- Not a requirement, but it would be nice to get the patch in the
  typical `git format-patch' fashion.


Thanks,
Corinna

--
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Limited Internet speeds caused by inappropriate socket buffering in function fdsock (winsup/net.cc)

Daniel Havey
In reply to this post by Daniel Havey
Hi Corinna,
I can see your email on the archive, but, I never received it in my
gmail account (not even in a spam folder).  I think the Internet ate
your message.

Yes Windows :).  I'm the Program Manager for Windows 10 transports and
IP.  Anything in layers 4 or 3.  We can help you with network stack in
the current release of Windows 10.  Downlevel is more difficult.  I'm
not sure about the answer to your question on the size of wmem.  I
don't think that there is a static value that will work in all cases
since Windows TCP will send 1 BDP worth of data per RTT.  If the BDP
is large then the static value could easily be too small and if the
BDP is small then the static value could easily be too large.  It will
take some digging to figure out what the best practice is.  I will do
some digging and let you know the results.

In the mean time I will apply your recommendations to my patch and repost it.

thanxs ;^)
...Daniel





On Mon, Jan 9, 2017 at 3:49 PM, Daniel Havey <[hidden email]> wrote:

> At Windows we love what you are doing with Cygwin.  However, we have
> been getting reports from our hardware vendors that iperf is slow on
> Windows.  Iperf is of course compiled against the cygwin1.dll and we
> believe we have traced the problem down to the function fdsock in
> net.cc.  SO_RCVBUF and SO_SNDBUF are being manually set.  The comments
> indicate that the idea was to increase the buffer size, but, this code
> must have been written long ago because Windows has used autotuning
> for a very long time now.  Please do not manually set SO_RCVBUF or
> SO_SNDBUF as this will limit your internet speed.
>
> I am providing a patch, an STC and my cygcheck -svr output.  Hope we
> can fix this.  Please let me know if I can help further.
>
> Simple Test Case:
> I have a script that pings 4 times and then iperfs for 10 seconds to
> debit.k-net.fr
>
>
> With patch
> $ bash buffer_test.sh 178.250.209.22
> usage: bash buffer_test.sh <iperf server name>
>
> Pinging 178.250.209.22 with 32 bytes of data:
> Reply from 178.250.209.22: bytes=32 time=167ms TTL=34
> Reply from 178.250.209.22: bytes=32 time=173ms TTL=34
> Reply from 178.250.209.22: bytes=32 time=173ms TTL=34
> Reply from 178.250.209.22: bytes=32 time=169ms TTL=34
>
> Ping statistics for 178.250.209.22:
>     Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
> Approximate round trip times in milli-seconds:
>     Minimum = 167ms, Maximum = 173ms, Average = 170ms
> ------------------------------------------------------------
> Client connecting to 178.250.209.22, TCP port 5001
> TCP window size: 64.0 KByte (default)
> ------------------------------------------------------------
> [  3] local 10.137.196.108 port 58512 connected with 178.250.209.22 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0- 1.0 sec   768 KBytes  6.29 Mbits/sec
> [  3]  1.0- 2.0 sec  9.25 MBytes  77.6 Mbits/sec
> [  3]  2.0- 3.0 sec  18.0 MBytes   151 Mbits/sec
> [  3]  3.0- 4.0 sec  18.0 MBytes   151 Mbits/sec
> [  3]  4.0- 5.0 sec  18.0 MBytes   151 Mbits/sec
> [  3]  5.0- 6.0 sec  18.0 MBytes   151 Mbits/sec
> [  3]  6.0- 7.0 sec  18.0 MBytes   151 Mbits/sec
> [  3]  7.0- 8.0 sec  18.0 MBytes   151 Mbits/sec
> [  3]  8.0- 9.0 sec  18.0 MBytes   151 Mbits/sec
> [  3]  9.0-10.0 sec  18.0 MBytes   151 Mbits/sec
> [  3]  0.0-10.0 sec   154 MBytes   129 Mbits/sec
>
>
> Without patch:
> dahavey@DMH-DESKTOP ~
> $ bash buffer_test.sh 178.250.209.22
>
> Pinging 178.250.209.22 with 32 bytes of data:
> Reply from 178.250.209.22: bytes=32 time=168ms TTL=34
> Reply from 178.250.209.22: bytes=32 time=167ms TTL=34
> Reply from 178.250.209.22: bytes=32 time=170ms TTL=34
> Reply from 178.250.209.22: bytes=32 time=169ms TTL=34
>
> Ping statistics for 178.250.209.22:
>     Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
> Approximate round trip times in milli-seconds:
>     Minimum = 167ms, Maximum = 170ms, Average = 168ms
> ------------------------------------------------------------
> Client connecting to 178.250.209.22, TCP port 5001
> TCP window size:  208 KByte (default)
> ------------------------------------------------------------
> [  3] local 10.137.196.108 port 58443 connected with 178.250.209.22 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0- 1.0 sec   512 KBytes  4.19 Mbits/sec
> [  3]  1.0- 2.0 sec  1.50 MBytes  12.6 Mbits/sec
> [  3]  2.0- 3.0 sec  1.50 MBytes  12.6 Mbits/sec
> [  3]  3.0- 4.0 sec  1.50 MBytes  12.6 Mbits/sec
> [  3]  4.0- 5.0 sec  1.50 MBytes  12.6 Mbits/sec
> [  3]  5.0- 6.0 sec  1.50 MBytes  12.6 Mbits/sec
> [  3]  6.0- 7.0 sec  1.50 MBytes  12.6 Mbits/sec
> [  3]  7.0- 8.0 sec  1.50 MBytes  12.6 Mbits/sec
> [  3]  8.0- 9.0 sec  1.50 MBytes  12.6 Mbits/sec
> [  3]  9.0-10.0 sec  1.50 MBytes  12.6 Mbits/sec
> [  3]  0.0-10.1 sec  14.1 MBytes  11.7 Mbits/sec
>
>
> The output shows that the RTT from my machine to the iperf server is
> similar in both cases (about 170ms) however with the patch the
> throughput averages 129 Mbps while without the patch the throughput
> only averages 11.7 Mbps.  If we calculate the maximum throughput using
> Bandwidth = Queue/RTT we get (212992 * 8)/0.170 = 10.0231 Mbps.  This
> is just about what iperf is showing us without the patch since the
> buffer size is set to 212992 I believe that the buffer size is
> limiting the throughput.  With the patch we have no buffer limitation
> (autotuning) and can develop the full potential bandwidth on the link.
>
> If you want to duplicate the STC you will have to find an iperf server
> (I found an extreme case) that has a large enough RTT distance from
> you and try a few times.  I get varying results depending on Internet
> traffic but without the patch never exceed the limit caused by the
> buffering.
>
> Here is the patch:
> --- net.cc_orig 2017-01-09 09:37:54.301210600 -0800
> +++ net.cc 2017-01-09 14:15:57.998895500 -0800
> @@ -517,7 +517,7 @@
>  bool
>  fdsock (cygheap_fdmanip& fd, const device *dev, SOCKET soc)
>  {
> -  int size;
> +//  int size;
>
>    fd = build_fh_dev (*dev);
>    if (!fd.isopen ())
> @@ -584,6 +584,7 @@
>    fd->set_flags (O_RDWR | O_BINARY);
>    debug_printf ("fd %d, name '%s', soc %p", (int) fd, dev->name (), soc);
>
> +
>    /* Raise default buffer sizes (instead of WinSock default 8K).
>
>       64K appear to have the best size/performance ratio for a default
> @@ -608,6 +609,8 @@
>       of 1k, but since 64k breaks WSADuplicateSocket we use 63Kb.
>
>       (*) Maximum normal TCP window size.  Coincidence?  */
> +
> +
>  #ifdef __x86_64__
>    ((fhandler_socket *) fd)->rmem () = 212992;
>    ((fhandler_socket *) fd)->wmem () = 212992;
> @@ -615,6 +618,14 @@
>    ((fhandler_socket *) fd)->rmem () = 64512;
>    ((fhandler_socket *) fd)->wmem () = 64512;
>  #endif
> +
> +/*   Please don't do this.  Windows doesn't have a default buffer of
> 8K it uses autotuning.
> +     The thing about network buffers is they have to be chosen
> dynamically.  Both Windows
> +     and Linux do this.  However, this code sets the SO_SNDBUF and
> SO_RCVBUF size statically.
> +     This will limit Internet speeds and cause bufferbloat.  Let the
> OS dynamically choose
> +     the SO_SNDBUF and SO_RCVBUF sizes.
> +
> +
>    if (::setsockopt (soc, SOL_SOCKET, SO_RCVBUF,
>      (char *) &((fhandler_socket *) fd)->rmem (), sizeof (int)))
>      {
> @@ -632,7 +643,9 @@
>   (char *) &((fhandler_socket *) fd)->wmem (),
>   (size = sizeof (int), &size)))
>   system_printf ("getsockopt(SO_SNDBUF) failed, %u", WSAGetLastError ());
> -    }
> +    } */
> +
> +
>
>    /* A unique ID is necessary to recognize fhandler entries which are
>       duplicated by dup(2) or fork(2).  This is used in BSD flock calls
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Limited Internet speeds caused by inappropriate socket buffering in function fdsock (winsup/net.cc)

Corinna Vinschen-2
Hi Daniel,

On Jan 11 16:38, Daniel Havey wrote:
> Hi Corinna,
> I can see your email on the archive, but, I never received it in my
> gmail account (not even in a spam folder).  I think the Internet ate
> your message.

No, I only sent the reply to the list.  It's customary in the Cygwin
mailing lists not to CC the original poster, unless the poster requests
it.  I CCed you now, of course.

> Yes Windows :).  I'm the Program Manager for Windows 10 transports and
> IP.  Anything in layers 4 or 3.

Cool.  I'm glad for any input from "upstream" :)

> We can help you with network stack in
> the current release of Windows 10.  Downlevel is more difficult.  I'm
> not sure about the answer to your question on the size of wmem.  I
> don't think that there is a static value that will work in all cases
> since Windows TCP will send 1 BDP worth of data per RTT.  If the BDP
> is large then the static value could easily be too small and if the
> BDP is small then the static value could easily be too large.  It will
> take some digging to figure out what the best practice is.  I will do
> some digging and let you know the results.

I'm really looking forward to it!

> In the mean time I will apply your recommendations to my patch and repost it.

Same here.   Thanks a lot for fixing another of those pesky "Cygwin is
slow" problems :)))


Corinna

--
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Limited Internet speeds caused by inappropriate socket buffering in function fdsock (winsup/net.cc)

Corinna Vinschen-2
Hi Daniel,

any news?


Thanks,
Corinna


On Jan 14 16:17, Corinna Vinschen wrote:

> On Jan 11 16:38, Daniel Havey wrote:
> > Hi Corinna,
> > I can see your email on the archive, but, I never received it in my
> > gmail account (not even in a spam folder).  I think the Internet ate
> > your message.
>
> No, I only sent the reply to the list.  It's customary in the Cygwin
> mailing lists not to CC the original poster, unless the poster requests
> it.  I CCed you now, of course.
>
> > Yes Windows :).  I'm the Program Manager for Windows 10 transports and
> > IP.  Anything in layers 4 or 3.
>
> Cool.  I'm glad for any input from "upstream" :)
>
> > We can help you with network stack in
> > the current release of Windows 10.  Downlevel is more difficult.  I'm
> > not sure about the answer to your question on the size of wmem.  I
> > don't think that there is a static value that will work in all cases
> > since Windows TCP will send 1 BDP worth of data per RTT.  If the BDP
> > is large then the static value could easily be too small and if the
> > BDP is small then the static value could easily be too large.  It will
> > take some digging to figure out what the best practice is.  I will do
> > some digging and let you know the results.
>
> I'm really looking forward to it!
>
> > In the mean time I will apply your recommendations to my patch and repost it.
>
> Same here.   Thanks a lot for fixing another of those pesky "Cygwin is
> slow" problems :)))
>
>
> Corinna
>
> --
> Corinna Vinschen                  Please, send mails regarding Cygwin to
> Cygwin Maintainer                 cygwin AT cygwin DOT com
> Red Hat


--
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

signature.asc (836 bytes) Download Attachment
Loading...