[PATCH] Cygwin: Speed up mkimport

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Cygwin: Speed up mkimport

Mark Geisert
Cut mkimport elapsed time in half by forking each iteration of the two
time-consuming loops within.  Only do this if more than one CPU is
present.  In the second loop, combine the two 'objdump' calls into one
system() invocation to avoid a system() invocation per iteration.

---
 winsup/cygwin/mkimport | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/winsup/cygwin/mkimport b/winsup/cygwin/mkimport
index 2b08dfe3d..919dc305b 100755
--- a/winsup/cygwin/mkimport
+++ b/winsup/cygwin/mkimport
@@ -47,6 +47,9 @@ for my $sym (keys %replace) {
     $import{$fn} = $imp_sym;
 }
 
+my $ncpus = `grep -c ^processor /proc/cpuinfo`;
+my $forking = $ncpus > 1; # Decides if loops below should fork() each iteration
+
 for my $f (keys %text) {
     my $imp_sym = delete $import{$f};
     my $glob_sym = $text{$f};
@@ -56,25 +59,30 @@ for my $f (keys %text) {
  $text{$f} = 0;
     } else {
  $text{$f} = 1;
- open my $as_fd, '|-', $as, '-o', "$dir/t-$f", "-";
- if ($is64bit) {
-    print $as_fd <<EOF;
+ if ($forking && fork) {
+    ; # Testing shows sleep here is unneeded. 'as' runs very quickly.
+ } else {
+    open my $as_fd, '|-', $as, '-o', "$dir/t-$f", "-";
+    if ($is64bit) {
+ print $as_fd <<EOF;
  .text
  .extern $imp_sym
  .global $glob_sym
 $glob_sym:
  jmp *$imp_sym(%rip)
 EOF
- } else {
-    print $as_fd <<EOF;
+    } else {
+ print $as_fd <<EOF;
  .text
  .extern $imp_sym
  .global $glob_sym
 $glob_sym:
  jmp *$imp_sym
 EOF
+    }
+    close $as_fd or exit 1;
+    exit 0 if $forking;
  }
- close $as_fd or exit 1;
     }
 }
 
@@ -86,8 +94,18 @@ for my $f (keys %text) {
     if (!$text{$f}) {
  unlink $f;
     } else {
- system $objcopy, '-R', '.text', $f and exit 1;
- system $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
+ if ($forking && fork) {
+    # Testing shows parent does need to sleep a short time here,
+    # otherwise system is inundated with hundreds of objcopy processes
+    # and the forked perl processes that launched them.
+    my $delay = 0.01; # NOTE: Slower systems may need to raise this
+    select(undef, undef, undef, $delay); # Supports fractional seconds
+ } else {
+    # Do two objcopy calls at once to avoid one system() call overhead
+    system '(', $objcopy, '-R', '.text', $f, ')', '||',
+ $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
+    exit 0 if $forking;
+ }
     }
 }
 
--
2.29.2

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: Speed up mkimport

Mark Geisert
Previously, Mark Geisert wrote:
> Cut mkimport elapsed time in half by forking each iteration of the two
> time-consuming loops within.  Only do this if more than one CPU is
> present.  In the second loop, combine the two 'objdump' calls into one
                                                  ^^^^^^^
That should say objcopy.  The code is correct though.

..mark
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: Speed up mkimport

Achim Gratz
In reply to this post by Mark Geisert
Mark Geisert writes:
> +    # Do two objcopy calls at once to avoid one system() call overhead
> +    system '(', $objcopy, '-R', '.text', $f, ')', '||',
> + $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;

That doesn't do what you think it does.  It in fact increases the
overhead since it'll start a shell that runs those two commands sand
will even needlessly start the first objcopy in a subshell.


Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf rackAttack:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: Speed up mkimport

Jon TURNEY
In reply to this post by Mark Geisert
On 26/11/2020 09:56, Mark Geisert wrote:
> Cut mkimport elapsed time in half by forking each iteration of the two
> time-consuming loops within.  Only do this if more than one CPU is
> present.  In the second loop, combine the two 'objdump' calls into one
> system() invocation to avoid a system() invocation per iteration.

Nice.  Thanks for looking into this.

> @@ -86,8 +94,18 @@ for my $f (keys %text) {
>       if (!$text{$f}) {
>   unlink $f;
>       } else {
> - system $objcopy, '-R', '.text', $f and exit 1;
> - system $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
> + if ($forking && fork) {
> +    # Testing shows parent does need to sleep a short time here,
> +    # otherwise system is inundated with hundreds of objcopy processes
> +    # and the forked perl processes that launched them.
> +    my $delay = 0.01; # NOTE: Slower systems may need to raise this
> +    select(undef, undef, undef, $delay); # Supports fractional seconds
> + } else {
> +    # Do two objcopy calls at once to avoid one system() call overhead
> +    system '(', $objcopy, '-R', '.text', $f, ')', '||',
> + $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
> +    exit 0 if $forking;
> + }
>       }
>   }
>  

Hmm... not so sure about this.  This seems racy, as nothing ensures that
these objcopies have finished before we combine all the produced .o
files into a library.

I'm pretty sure with more understanding, this whole thing could be done
better:  For example, from a brief look, it seems that the t-*.o files
are produced by gas, and then we remove .bss and .data sections.  Could
we not arrange to assemble these objects without those sections in the
first place?
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: Speed up mkimport

Mark Geisert
In reply to this post by Achim Gratz
Achim Gratz wrote:
> Mark Geisert writes:
>> +    # Do two objcopy calls at once to avoid one system() call overhead
>> +    system '(', $objcopy, '-R', '.text', $f, ')', '||',
>> + $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
>
> That doesn't do what you think it does.  It in fact increases the
> overhead since it'll start a shell that runs those two commands sand
> will even needlessly start the first objcopy in a subshell.

Still faster than two system commands :-).  But thanks for the comment; I thought
I was merely grouping args, to get around Perl's greedy arg list building for the
system command.  After more experimenting I ended up with:
             system '/bin/true', '||', $objcopy, '-R', '.text', $f, '||',
                 $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
Kind of ugly, but better?  It obviates the need for parent to pace itself so the
enclosing loop runs a bit faster.

..mark
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: Speed up mkimport

Mark Geisert
In reply to this post by Jon TURNEY
Jon Turney wrote:

> On 26/11/2020 09:56, Mark Geisert wrote:
>> @@ -86,8 +94,18 @@ for my $f (keys %text) {
>>       if (!$text{$f}) {
>>       unlink $f;
>>       } else {
>> -    system $objcopy, '-R', '.text', $f and exit 1;
>> -    system $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
>> +    if ($forking && fork) {
>> +        # Testing shows parent does need to sleep a short time here,
>> +        # otherwise system is inundated with hundreds of objcopy processes
>> +        # and the forked perl processes that launched them.
>> +        my $delay = 0.01; # NOTE: Slower systems may need to raise this
>> +        select(undef, undef, undef, $delay); # Supports fractional seconds
>> +    } else {
>> +        # Do two objcopy calls at once to avoid one system() call overhead
>> +        system '(', $objcopy, '-R', '.text', $f, ')', '||',
>> +        $objcopy, '-R', '.bss', '-R', '.data', "t-$f" and exit 1;
>> +        exit 0 if $forking;
>> +    }
>>       }
>>   }
>
> Hmm... not so sure about this.  This seems racy, as nothing ensures that these
> objcopies have finished before we combine all the produced .o files into a library.

Good point.  I've added a hash to track the forked pids, and after each of these
two time-consuming loops finishes I loop over the pids list doing waitpid() on
each pid.

> I'm pretty sure with more understanding, this whole thing could be done better:  
> For example, from a brief look, it seems that the t-*.o files are produced by gas,
> and then we remove .bss and .data sections.  Could we not arrange to assemble
> these objects without those sections in the first place?

I looked over as's options in its man page but could not see anything obvious.  I
wonder if defining the sections explicitly as zero-length somehow in mkimport's
assembler snippets would accomplish the same thing.  I'll try this next.

Note that mkimport operates both on those tiny object files it creates with as,
but also on the object files created by the whole Cygwin build.  So adjusting the
latter object files would need to be done somewhere else.
Thanks,

..mark

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: Speed up mkimport

Achim Gratz
In reply to this post by Mark Geisert
Mark Geisert writes:
> Still faster than two system commands :-).  But thanks for the
> comment;

It still seems you are barking up the wrong tree.

> I thought I was merely grouping args, to get around Perl's
> greedy arg list building for the system command.

Wot?  It just takes a list which you can build any which way you desire.
The other option is to give it the full command line in a string, which
does work for this script (but not on Windows).  If it finds shell
metacharacters in the arguments it'll run a shell, otherwise the forked
perl just does an execve.

If it's really the forking that is causing the slowdown, why not do
either of those things:

a) Generate a complete shell script and fork once to run that.

b) Open up two pipes to an "xargs -P $ncpu/2 L 1 …" and feed in the file
names.

Getting the error codes back to the script and handling the error is
left as an exercise for the reader.


Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Q+, Q and microQ:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: Speed up mkimport

Brian Inglis
On 2020-11-27 11:37, Achim Gratz wrote:

> Mark Geisert writes:
>> Still faster than two system commands :-).  But thanks for the
>> comment;
>
> It still seems you are barking up the wrong tree.
>
>> I thought I was merely grouping args, to get around Perl's
>> greedy arg list building for the system command.
>
> Wot?  It just takes a list which you can build any which way you desire.
> The other option is to give it the full command line in a string, which
> does work for this script (but not on Windows).  If it finds shell
> metacharacters in the arguments it'll run a shell, otherwise the forked
> perl just does an execve.
>
> If it's really the forking that is causing the slowdown, why not do
> either of those things:
>
> a) Generate a complete shell script and fork once to run that.
>
> b) Open up two pipes to an "xargs -P $ncpu/2 L 1 …" and feed in the file
> names.
>
> Getting the error codes back to the script and handling the error is
> left as an exercise for the reader.

Use explicit binary paths to avoid path search overhead; for portability: /bin/
for base system, dir, file, and net utils including compressors, grep, and sed;
/usr/bin/ otherwise; {/usr,}/sbin/ for some admin utils not elsewhere.

--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: Speed up mkimport

Achim Gratz
In reply to this post by Achim Gratz
Achim Gratz writes:
> b) Open up two pipes to an "xargs -P $ncpu/2 L 1 …" and feed in the file
> names.

That actually works, but the speedup is quite modest on my system
(4C/8T) even though I've allowed it to use unlimited resources.  So it
basically forks slower than the runtime for each of the invocations is.
Some more speedup can be had if the assembler is run on actual files in
the same way, but the best I've come up with goes from 93s to 47s and
runs at 150% CPU (up from 85%).  Most of that time is spent in system,
so forking and I/O.


Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Blofeld:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: Speed up mkimport

Achim Gratz
Achim Gratz writes:
> That actually works, but the speedup is quite modest on my system
> (4C/8T) even though I've allowed it to use unlimited resources.  So it
> basically forks slower than the runtime for each of the invocations is.
> Some more speedup can be had if the assembler is run on actual files in
> the same way, but the best I've come up with goes from 93s to 47s and
> runs at 150% CPU (up from 85%).  Most of that time is spent in system,
> so forking and I/O.

Not that I really know what I'm doing, but creating a single .s file and
running as just once gets mkimport down to 21s / 110%.  Now the
resulting library doesn't actually link, because somehow the information
ends up in the wrong place…


Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Wavetables for the Terratec KOMPLEXER:
http://Synth.Stromeko.net/Downloads.html#KomplexerWaves