Command line processing in dcrt0.cc does not match Microsoft parsing rules

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Command line processing in dcrt0.cc does not match Microsoft parsing rules

Cygwin list mailing list
The standard rules for Microsoft command line processing are documented here:

    https://docs.microsoft.com/en-us/previous-versions/17w5ykft(v=vs.85)

The Cygwin code for command line processing is in dcrt0.cc, function build_argv.

The behaviors do not match. For instance, given a test.sh script like this:

    #!/bin/bash
    echo $1

And the following invocation of bash.exe from a Windows command prompt:

    bash.exe test.sh foo\"bar

The result is:

    foo\bar

When the expected result is:

    foo"bar

As a workaround, you can achieve the expected result using:

    bash.exe test.sh "foo\"bar"

Which is great until you use a language like Go to shell exec the command line, and don't have control over how the command line string is generated from an original set of arguments. See:

    https://github.com/golang/go/blob/master/src/syscall/exec_windows.go#L86

Go just reverses the Microsoft standard rules in the most efficient manner possible, but those command lines don't parse correctly in Cygwin processes.

Go implements a pretty definitive command line parsing algorithm as a replacement for the CommandLineToArgv function in shell32.dll:

    https://github.com/golang/go/commit/39c8d2b7faed06b0e91a1ad7906231f53aab45d1

The behavior here is based on a detailed analysis of what command line parsing "should" be in Windows:

    http://daviddeley.com/autohotkey/parameters/parameters.htm#WINARGV

It would be very nice if Cygwin followed the same procedure at startup.

Thanks,
Stephen


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Brian Inglis
On 2019-08-30 13:16, Stephen Provine via cygwin wrote:

> The standard rules for Microsoft command line processing are documented here:
>     https://docs.microsoft.com/en-us/previous-versions/17w5ykft(v=vs.85)
> The Cygwin code for command line processing is in dcrt0.cc, function build_argv.
> The behaviors do not match. For instance, given a test.sh script like this:
>     #!/bin/bash
>     echo $1
> And the following invocation of bash.exe from a Windows command prompt:
>     bash.exe test.sh foo\"bar
> The result is:
>     foo\bar
> When the expected result is:
>     foo"bar
> As a workaround, you can achieve the expected result using:
>     bash.exe test.sh "foo\"bar"
> Which is great until you use a language like Go to shell exec the command line, and don't have control over how the command line string is generated from an original set of arguments. See:
>     https://github.com/golang/go/blob/master/src/syscall/exec_windows.go#L86
> Go just reverses the Microsoft standard rules in the most efficient manner possible, but those command lines don't parse correctly in Cygwin processes.
> Go implements a pretty definitive command line parsing algorithm as a replacement for the CommandLineToArgv function in shell32.dll:
>     https://github.com/golang/go/commit/39c8d2b7faed06b0e91a1ad7906231f53aab45d1
> The behavior here is based on a detailed analysis of what command line parsing "should" be in Windows:
>     http://daviddeley.com/autohotkey/parameters/parameters.htm#WINARGV
> It would be very nice if Cygwin followed the same procedure at startup.

Cygwin command line parsing has to match Unix shell command line processing,
like argument splitting, joining within single or double quotes or after a
backslash escaped white space characters, globbing, and other actions normally
performed by a shell, when any Cygwin program is invoked from any Windows
program e.g. cmd, without those Windows limitations which exclude any use of a
backslash escape character except preceding another or a double quote.

Mixing Cygwin and Windows programs is a user choice requiring them to deal with
any interface issues: just use mintty with bash. ;^> It's actually the same
situation as invoking any another Cygwin program which also does some argument
interpretation, from the shell, possibly requiring nested quoting and escaping.

--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

RE: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Cygwin list mailing list
In reply to this post by Cygwin list mailing list
> Cygwin command line parsing has to match Unix shell command line processing,
> like argument splitting, joining within single or double quotes or after a
> backslash escaped white space characters, globbing, and other actions normally
> performed by a shell, when any Cygwin program is invoked from any Windows
> program e.g. cmd, without those Windows limitations which exclude any use of a
> backslash escape character except preceding another or a double quote.

I guess my assumption was that the "winshell" parameter would be used to determine
when a Cygwin process is called from a non-Cygwin process and that it would be more
appropriate to use standard Windows command line processing (as limiting as it may
be) in that case. Once in the Cygwin environment, calls from one process to another
should obviously process command lines according to Unix shell rules.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Brian Inglis
On 2019-08-30 14:59, Stephen Provine wrote:
>> Cygwin command line parsing has to match Unix shell command line processing,
>> like argument splitting, joining within single or double quotes or after a
>> backslash escaped white space characters, globbing, and other actions normally
>> performed by a shell, when any Cygwin program is invoked from any Windows
>> program e.g. cmd, without those Windows limitations which exclude any use of a
>> backslash escape character except preceding another or a double quote.

> I guess my assumption was that the "winshell" parameter would be used to determine
> when a Cygwin process is called from a non-Cygwin process and that it would be more
> appropriate to use standard Windows command line processing (as limiting as it may
> be) in that case. Once in the Cygwin environment, calls from one process to another
> should obviously process command lines according to Unix shell rules.

Not being in the same Cygwin process group and lacking the appropriate interface
info indicates that the invoker was not Cygwin.
Cygwin command line file name globs can include any UTF-8 character excluding
forward and backward (for Windows compatibility) oblique slashes and nulls, with
non-Windows supported characters including leading and trailing spaces and dots,
and result in thousands of file name arguments on the command line e.g.

        $ echo /var/log/* | wc -lwmcL
              1   66858 2903078 2903078 2903077

shows I need to clean up my /var/log directory as it contains 64K+ files with
names totalling 2234498 chars/bytes, plus 668579 for paths and spaces, plus a
newline terminator.

Some file names with non-Windows supported characters have them converted to the
UTF-16LE BMP PUA by adding xf000, or for characters not supported by non-UTF-8
interface encodings, ^X CAN x18 followed by a BMP UTF-8 sequence, allowing
conversion to UTF-16LE, at the cost of weird characters in the displayed names.

--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Andrey Repin
In reply to this post by Cygwin list mailing list
Greetings, Stephen Provine!

> The standard rules for Microsoft command line processing are documented here:

>     https://docs.microsoft.com/en-us/previous-versions/17w5ykft(v=vs.85)

> The Cygwin code for command line processing is in dcrt0.cc, function build_argv.

> The behaviors do not match. For instance, given a test.sh script like this:

>     #!/bin/bash
>     echo $1

> And the following invocation of bash.exe from a Windows command prompt:

>     bash.exe test.sh foo\"bar

> The result is:

>     foo\bar

> When the expected result is:

>     foo"bar

I would actually expect parsing error, but I guess, CMD gives you some slack.
Then, the expected result is either 'foo\"bar' or 'foo\bar', since in CMD, the
escape character is a caret (^).

> As a workaround, you can achieve the expected result using:

>     bash.exe test.sh "foo\"bar"

> Which is great until you use a language like Go to shell exec the command
> line, and don't have control over how the command line string is generated
> from an original set of arguments. See:

>     https://github.com/golang/go/blob/master/src/syscall/exec_windows.go#L86

> Go just reverses the Microsoft standard rules in the most efficient manner
> possible, but those command lines don't parse correctly in Cygwin processes.

> Go implements a pretty definitive command line parsing algorithm as a
> replacement for the CommandLineToArgv function in shell32.dll:

>    
> https://github.com/golang/go/commit/39c8d2b7faed06b0e91a1ad7906231f53aab45d1

> The behavior here is based on a detailed analysis of what command line parsing "should" be in Windows:

>     http://daviddeley.com/autohotkey/parameters/parameters.htm#WINARGV

> It would be very nice if Cygwin followed the same procedure at startup.

> Thanks,
> Stephen


> --
> Problem reports:       http://cygwin.com/problems.html
> FAQ:                   http://cygwin.com/faq/
> Documentation:         http://cygwin.com/docs.html
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



--
With best regards,
Andrey Repin
Saturday, August 31, 2019 11:27:38

Sorry for my terrible english...


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

RE: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Cygwin list mailing list
In reply to this post by Cygwin list mailing list
On 2019-08-30 21:58, Brian Inglis wrote:
> Not being in the same Cygwin process group and lacking the appropriate interface
> info indicates that the invoker was not Cygwin.

Should I interpret this to mean the "winshell" parameter is not an accurate
statement of what I thought it was for and because there is no way to reliably
determine if the calling process was from Cygwin or not, behavior like I suggest
is actually impossible?

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Brian Inglis
On 2019-09-03 10:38, Stephen Provine wrote:
> On 2019-08-30 21:58, Brian Inglis wrote:
>> Not being in the same Cygwin process group and lacking the appropriate interface
>> info indicates that the invoker was not Cygwin.
>
> Should I interpret this to mean the "winshell" parameter is not an accurate
> statement of what I thought it was for and because there is no way to reliably
> determine if the calling process was from Cygwin or not, behavior like I suggest
> is actually impossible?

Reread the rules in the article you quoted, carefully, then read:

http://www.windowsinspired.com/how-a-windows-programs-splits-its-command-line-into-individual-arguments/
[also see linked articles about cmd and batch file command line parsing]

and ask if you really expect anyone else to use or reproduce this insanity,
rather than a sane POSIX parser?
Once again MS "persists in reinventing the square wheel", badly [from Henry
Spencer's Commandments].

What does the Go command line parser actually accept, does it really invert the
parse_cmdline or CommandLineToArgvW rules, and which?

That winshell parameter is set in dcrt0.cc calling build_argv, based on whether
the parent process was Cygwin and an argv array is available preset by the
Cygwin parent, or not and globs are allowed to be expanded, such that the
command line args, quotes, and wildcards have to be handled by the program
according to POSIX shell command line quoting, field splitting, and pathname
expansion rules, respecting $IFS:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html

The similar flag in spawn.cc based on the exe or interpreter exe being under a
Cygwin exec mount in realpath.iscygexec() decides whether the argv array can be
passed a la Unix to a Cygwin child, or a Windows command line needs to be built
with Windows argument double quoting and escaping where required.

--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

RE: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Cygwin list mailing list
In reply to this post by Cygwin list mailing list
On 2019-09-04 10:20, Brian Inglis wrote:
> and ask if you really expect anyone else to use or reproduce this insanity,
> rather than a sane POSIX parser?

I know it's insanity, but it's insanity that almost all Windows programs inherit and
implement consistently enough because they use standard libraries or functions
to do the parsing. The Go command line parser used to use CommandLineToArgvW
and only switched away from it due to performance (it's in shell32.dll and that takes
a long time to load). I don't know how accurate their manual reproduction is, but
they seemed to study the sources I sent pretty carefully.

Anyway, my specific problem is that I have Go code with an array of arguments that
I want to pass verbatim (no glob expansion) to a bash script. I've figured out how to
override Go's default code for building the command line string, but it's not clear how
to correctly construct the command line string. If the POSIX rules are being followed,
I'd expect the following to work:

    bash.exe script.sh arg1 "*" arg3

But it always expands the "*" to all the files in the current directory. I've also tried \* and
'*', but same problem. So how do I build a command line string that takes each argument
literally with no processing?

Thanks,
Stephen


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Brian Inglis
On 2019-09-04 17:46, Stephen Provine wrote:

> On 2019-09-04 10:20, Brian Inglis wrote:
>> and ask if you really expect anyone else to use or reproduce this insanity,
>> rather than a sane POSIX parser?
>
> I know it's insanity, but it's insanity that almost all Windows programs inherit and
> implement consistently enough because they use standard libraries or functions
> to do the parsing. The Go command line parser used to use CommandLineToArgvW
> and only switched away from it due to performance (it's in shell32.dll and that takes
> a long time to load). I don't know how accurate their manual reproduction is, but
> they seemed to study the sources I sent pretty carefully.
>
> Anyway, my specific problem is that I have Go code with an array of arguments that
> I want to pass verbatim (no glob expansion) to a bash script. I've figured out how to
> override Go's default code for building the command line string, but it's not clear how
> to correctly construct the command line string. If the POSIX rules are being followed,
> I'd expect the following to work:
>
>     bash.exe script.sh arg1 "*" arg3
>
> But it always expands the "*" to all the files in the current directory. I've also tried \* and
> '*', but same problem. So how do I build a command line string that takes each argument
> literally with no processing?

As standard on Unix systems, just add another level of quoting for each level of
interpretation, as bash will process that command line, then bash will process
the script command line.

How are you running the command line; I get the same results under cmd or
mintty/bash:

$ bash -nvx script.sh arg1 "*" arg3
#!/bin/bash
# script.sh - echo args

argc=$#
argv=("$0" "$@")
echo argc $argc argv[0] "${argv[0]}"

for ((a = 1; a <= $argc; ++a))
do
    echo argv[$a] "${argv[$a]}"
done

C:\ > bash script.sh arg1 "*" arg3
argc 3 argv[0] script.sh
argv[1] arg1
argv[2] *
argv[3] arg3

C:\ > bash -c 'script.sh arg1 "*" arg3'
argc 3 argv[0] /mnt/c/Users/bwi/bin/script.sh
argv[1] arg1
argv[2] *
argv[3] arg3

$ bash script.sh arg1 "*" arg3
argc 3 argv[0] script.sh
argv[1] arg1
argv[2] *
argv[3] arg3

$ bash -c 'script.sh arg1 "*" arg3'
argc 3 argv[0] /home/bwi/bin/script.sh
argv[1] arg1
argv[2] *
argv[3] arg3

$ cmd /c bash script.sh arg1 "\*" arg3
argc 3 argv[0] script.sh
argv[1] arg1
argv[2] *
argv[3] arg3

$ cmd /c bash -c 'script.sh arg1 "*" arg3'
argc 3 argv[0] /mnt/c/Users/bwi/bin/script.sh
argv[1] arg1
argv[2] *
argv[3] arg3

but with un-double-quoted (and backslash escaped) * I get a list of the current
directory files from all of these commands.

Invoking bash with options -vx or set -vx in script.sh will let you see what is
happening on stderr. Many errors cause non-interactive shell scripts to exit, so
check for child process error return codes (often 128+errno). If you are not
careful within script.sh, many unquoted uses of $2 may expand the *. Double
quotes allow command and parameter substitution, and history expansion, but
suppress pathname expansion. You should refer to each parameter within script.sh
as "$1" "$2" "$3", or you might need to quote some or each argument character
and enclose the * in double quotes e.g. \""\*"\" to pass thru the Go command
line interface.
Can you not tell the interface to verbatim passthru the string for execution?

You may check any of the POSIX shell, dash/ash/sh shell, ksh Korn shell, or bash
shell man pages or docs for more details on variations between shells and
extensions to POSIX operation.

--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

RE: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Cygwin list mailing list
In reply to this post by Cygwin list mailing list
On 2019-09-04 23:29, Brian Inglis wrote:
> As standard on Unix systems, just add another level of quoting for each level of
> interpretation, as bash will process that command line, then bash will process
> the script command line.

My mistake - I'm very aware of the quoting rules, yet in my test script for this
scenario I forgot to quote the arguments. However, if POSIX rules are being
implemented, there is still something I didn't expect. Here's my bash script:

#!/bin/bash
echo "$1"
echo "$2"
echo "$3"

And I invoke it like this from a Windows command prompt:

C:\> bash -x script.sh foo bar\"baz bat
+ echo foo
foo
+ echo 'bar\baz bat'
bar\baz bat
+ echo ''

Not expected. Called from within Cygwin, the behavior is correct:

$ bash -x script.sh foo bar\"baz bat
+ echo foo
foo
+ echo 'bar"baz'
bar"baz
+ echo bat
bat

Can you explain this difference? The reason I ask is that if this worked,
the way Go constructs the command line string would be just fine.

Thanks,
Stephen


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Eric Blake (cygwin)-2
On 9/5/19 1:31 PM, Stephen Provine via cygwin wrote:

> My mistake - I'm very aware of the quoting rules, yet in my test script for this
> scenario I forgot to quote the arguments. However, if POSIX rules are being
> implemented, there is still something I didn't expect. Here's my bash script:
>
> #!/bin/bash
> echo "$1"
> echo "$2"
> echo "$3"
>
> And I invoke it like this from a Windows command prompt:
>
> C:\> bash -x script.sh foo bar\"baz bat
> + echo foo
> foo
> + echo 'bar\baz bat'
> bar\baz bat
> + echo ''
>
> Not expected.
Why not? That obeyed cmd's odd rules: The moment you have a " in the
command line, that argument continues until end of line or the next "
(regardless of how many \ precede the ").

https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/

Perhaps you meant to try:

c:\> bash -x script.sh foo ^"bar\^"baz^" bat

> Called from within Cygwin, the behavior is correct:
>
> $ bash -x script.sh foo bar\"baz bat
> + echo foo
> foo
> + echo 'bar"baz'
> bar"baz
> + echo bat
> bat

Moral of the story: POSIX rules are saner than cmd rules.

>
> Can you explain this difference? The reason I ask is that if this worked,
> the way Go constructs the command line string would be just fine.

If Go is not constructing the command line string in a manner that
matches that blog post, the bug would be in Go.  Presumably, Cygwin is
correctly quoting things any time it calls into a non-Cygwin process
(but if not, give us a test case for us to patch cygwin, or even better
submit the patch).

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Cygwin list mailing list
In reply to this post by Cygwin list mailing list
On 9/5/19 2:05 PM, Eric Blake wrote:
> On 9/5/19 1:31 PM, Stephen Provine via cygwin wrote:
> > Not expected.

> Why not? That obeyed cmd's odd rules: The moment you have a " in the
> command line, that argument continues until end of line or the next "
> (regardless of how many \ precede the ").

Now I'm really confused. Brian seemed to indicate that the POSIX rules were
followed, but you're indicating that the Windows command line parsing rules
are followed. So I assume the reality is that it is actually some mix of the two.
Is the effective parsing logic implemented by Cygwin documented anywhere?

Thanks,
Stephen


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Eric Blake (cygwin)-2

On 9/5/19 5:01 PM, Stephen Provine via cygwin wrote:

> On 9/5/19 2:05 PM, Eric Blake wrote:
>> On 9/5/19 1:31 PM, Stephen Provine via cygwin wrote:
>>> Not expected.
>
>> Why not? That obeyed cmd's odd rules: The moment you have a " in the
>> command line, that argument continues until end of line or the next "
>> (regardless of how many \ precede the ").
>
> Now I'm really confused. Brian seemed to indicate that the POSIX rules were
> followed, but you're indicating that the Windows command line parsing rules
> are followed. So I assume the reality is that it is actually some mix of the two.
> Is the effective parsing logic implemented by Cygwin documented anywhere?
If you start a Cygwin process from another cygwin process, then only
POSIX rules are in effect.  The bash shell parses its command line
according to POSIX rules, creates an argv[] to pass to exec(), then
cygwin1.dll manages to get that argv[], unscathed, to the new child
process (bypassing Window's mechanisms), which uses the argv[] as-is.

If you start a Windows process from a cygwin process, then cygwin1.dll
must quote the arguments into a single concatenated string that will be
reparsed in the manner that the Windows runtime expects, because the
Windows process only gets a single string, not an argv[].  But cygwin
should be providing the correct escaping so that windows then parses it
back into the same intended argv[] (if not, that's a bug in cygwin1.dll).

If you start a cygwin process from Windows, then cygwin1.dll is given
only a single string, which it must parse into argv according to windows
conventions (if it does not produce the same argv[] as a windows process
using CommandLineToArgvW, then that's a bug in cygwin1.dll).  But on top
of that, if you are using cmd.exe to generate your command line, then
you must use proper escaping, otherwise, cmd.exe can produce a command
line that has unexpected quoting in the string handed to
CommandLineToArgvW, and the Windows parsing when there are unbalanced
quotes can be screwy (if it encounters a " inside an argument that was
not quoted with ", then that groups remaining text into the same
argument until a balanced " or end of string is encountered).  So it is
not always obvious at first glance if what you type in cmd.exe provides
the argv[] that you intended, because of the two layers of
interpretation (one from cmd to Windows, and one from Windows convention
into argv[]).

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Cygwin list mailing list
In reply to this post by Cygwin list mailing list
On 9/5/19 5:46 PM, Eric Blake wrote:
> If you start a cygwin process from Windows, then cygwin1.dll is given
> only a single string, which it must parse into argv according to windows
> conventions (if it does not produce the same argv[] as a windows process
> using CommandLineToArgvW, then that's a bug in cygwin1.dll).  But on top
> of that, if you are using cmd.exe to generate your command line, then
> you must use proper escaping, otherwise, cmd.exe can produce a command
> line that has unexpected quoting in the string handed to
> CommandLineToArgvW, and the Windows parsing when there are unbalanced
> quotes can be screwy

Great explanation, it's very helpful.

I've been using cmd.exe to generate the command line for my tests, but the
original problem was when my compiled Go binary directly executes another
Windows process using the Win32 APIs like CreateProcess directly. Here's a
simple Go program that reproduces the issue:

package main

import (
        "log"
        "os"
        "os/exec"
)

func main() {
        cmd := exec.Command("C:\\cygwin64\\bin\\bash.exe", "test.sh", "foo", "bar\"baz", "bat")
        cmd.Stdout = os.Stdout
        cmd.Stderr = os.Stderr
        if err := cmd.Run(); err != nil {
                log.Fatal(err)
        }
}

The output of this process is:

foo
bar\baz bat


To prove it is not going through cmd.exe, I debugged the Go program
to the point that it calls the Win32 CreateProcess function, and the
first two arguments are:

lpApplicationName: "C:\\cygwin64\\bin\\bash.exe"
lpCommandLine: "C:\\cygwin64\\bin\\bash.exe test.sh foo bar\\\"baz bat"

So unless I'm missing something, bash.exe is not interpreting the command line
following the rules pointed to by the documentation for CommandLineToArgvW.

Thanks,
Stephen

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: RE: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Steven Penny
On Thu, 5 Sep 2019 23:45:44, "Stephen Provine via cygwin" wrote:

> package main
>
> import (
> "log"
> "os"
> "os/exec"
> )
>
> func main() {
> cmd :=3D exec.Command("C:\\cygwin64\\bin\\bash.exe", "test.sh", "foo", "ba=
> r\"baz", "bat")
> cmd.Stdout =3D os.Stdout
> cmd.Stderr =3D os.Stderr
> if err :=3D cmd.Run(); err !=3D nil {
> log.Fatal(err)
> }
> }

Why are you doing this? I hate to be that guy, but examples are important.
Arguably the most important lesson I have learned with computer programming is:
use the right tool for the job.

So when I need to do something, I start with a shell script. Then once a shell
script doesnt cut it anymore, I move to AWK, then Python, the Go. Substitute
your language of choice.

What I dont do is call a shell script from Go or anything else. I might call
"git.exe" or "ffmpeg.exe", but even then you could argue against it as those
binaries have libraries too.

I agree that Cygwin should be parsing to and from cmd.exe correctly. But unless
you have a valid use case, its kind of like "Cygwin theory". I have found that
historically those type issues are less likely to be resolved in timely manner,
if at all.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Eric Blake (cygwin)-2
In reply to this post by Cygwin list mailing list
On 9/5/19 6:45 PM, Stephen Provine via cygwin wrote:

>
> To prove it is not going through cmd.exe, I debugged the Go program
> to the point that it calls the Win32 CreateProcess function, and the
> first two arguments are:
>
> lpApplicationName: "C:\\cygwin64\\bin\\bash.exe"
> lpCommandLine: "C:\\cygwin64\\bin\\bash.exe test.sh foo bar\\\"baz bat"

And according to
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/
that is NOT the correct command line to be handing to CreateProcess, at
least not if you want things preserved.

If I read that page correctly, the unambiguously correct command line
should be:

"C:\\cygwin64\\bin\\bash.exe test.sh foo \"bar\\\"baz\" bat"

>
> So unless I'm missing something, bash.exe is not interpreting the command line
> following the rules pointed to by the documentation for CommandLineToArgvW.

Rather, go is not passing the command line to CreateProcess in the way
that is unambiguously parseable in the manner expected by
CommandLineToArgvW.  And because Go is relying on a corner case of
ambiguous parsing instead of well-balanced quoting, it's no surprise if
cygwin doesn't parse that corner case in the manner expected.  A patch
to teach cygwin to parse the corner case identically would be welcome,
but fixing recipient processes does not scale as well as fixing the
culprit source process.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Cygwin list mailing list
In reply to this post by Cygwin list mailing list
On 9/5/19 9:26 PM, Eric Blake wrote:
> Rather, go is not passing the command line to CreateProcess in the way
> that is unambiguously parseable in the manner expected by
> CommandLineToArgvW.

The specific example I gave is unambiguous and is parsed correctly by
CommandLineToArgvW, so if the goal is for Cygwin to effectively
simulate this function, I can confirm that it is missing this case.

It's reasonable that Go's algorithm should be changed to have a better
chance of working with Windows programs that manually implement
command line parsing and may not match expectations for all cases.
I'll follow up with them and for the time being, work around the issue
with my own implementation as I've since figured out how to do that.

FWIW, here's the most definitive reference I've found for how Windows
binaries compiled with the Microsoft C/C++ compilers do command line
parsing, in case there is any desire to address this issue at some point:

http://daviddeley.com/autohotkey/parameters/parameters.htm#WINCRULES

Thanks for entertaining my persistence on this topic!

Stephen


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Andrey Repin
In reply to this post by Cygwin list mailing list
Greetings, Stephen Provine!

> On 2019-09-04 23:29, Brian Inglis wrote:
>> As standard on Unix systems, just add another level of quoting for each level of
>> interpretation, as bash will process that command line, then bash will process
>> the script command line.

> My mistake - I'm very aware of the quoting rules, yet in my test script for this
> scenario I forgot to quote the arguments. However, if POSIX rules are being
> implemented, there is still something I didn't expect. Here's my bash script:

> #!/bin/bash
> echo "$1"
> echo "$2"
> echo "$3"

> And I invoke it like this from a Windows command prompt:

> C:\> bash -x script.sh foo bar\"baz bat
> + echo foo
> foo
> + echo 'bar\baz bat'
> bar\baz bat
> + echo ''

> Not expected. Called from within Cygwin, the behavior is correct:

Again, fully expected.

> $ bash -x script.sh foo bar\"baz bat
> + echo foo
> foo
> + echo 'bar"baz'
> bar"baz
> + echo bat
> bat

> Can you explain this difference?

CMD escape character is ^, not \

> The reason I ask is that if this worked,
> the way Go constructs the command line string would be just fine.

No.


--
With best regards,
Andrey Repin
Friday, September 6, 2019 23:33:46

Sorry for my terrible english...


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Brian Inglis
In reply to this post by Cygwin list mailing list
On 2019-09-05 16:01, Stephen Provine via cygwin wrote:

> On 9/5/19 2:05 PM, Eric Blake wrote:
>> On 9/5/19 1:31 PM, Stephen Provine via cygwin wrote:
>>> Not expected.
>
>> Why not? That obeyed cmd's odd rules: The moment you have a " in the
>> command line, that argument continues until end of line or the next "
>> (regardless of how many \ precede the ").
>
> Now I'm really confused. Brian seemed to indicate that the POSIX rules were
> followed, but you're indicating that the Windows command line parsing rules
> are followed. So I assume the reality is that it is actually some mix of the two.
> Is the effective parsing logic implemented by Cygwin documented anywhere?

Depends on what you are running thru - you have layers - in that test case you
ran from cmd, so cmd parsing has to be first taken into account, before passing
the resulting command line to bash, where Cygwin will construct a POSIX argument
list from cmd output, and pass that to bash then script.sh.

Try your testing using my script.sh shown earlier, and call bash with -vx
options for debugging output.

--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

RE: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Cygwin list mailing list
In reply to this post by Andrey Repin
On 2019-09-06 13:35, Andrey Repin wrote:
> CMD escape character is ^, not \

You are correct about the cmd.exe interpretation, so my test cases were
buggy, but Go invokes other executables using CreateProcess directly and
is not subject to the additional set of command line processing rules that
are used by cmd.exe.

If you see the last exchange with Eric, I think it is clear that there is a case
missing in the Cygwin processing rules that becomes a problem when a
calling process directly reverses the rules, specifically when an argument
value does not itself need to be quoted but it has a double quote in the
value. This is rule 4 in what I found to be the most definitive reference:

http://daviddeley.com/autohotkey/parameters/parameters.htm#WINCRULESCHANGE

And see the fourth example in section 5.4.

However, the *safest* way to construct a command line is to avoid this
case and make sure to always double quote an argument that contains
double quotes. The official algorithm from a Microsoft source was
previously posted by Eric:

https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/

Interesting that there's actually nothing in this article that specifically
means it *shouldn't* be ok to do what the Go algorithm does, it just
happens to be simpler if you don't worry about that case.

FWIW, .NET Core uses this algorithm:

https://github.com/dotnet/corefx/blob/master/src/Common/src/CoreLib/System/PasteArguments.cs

Which I think is probably pretty good validation that it's the right one to use.

So, the outcome of all of this is that Go should probably update their logic
as it's based on the wrong official source. I plan to follow up there. If there
is any interest in the future to correct the parsing behavior in Cygwin, the
information needed to do that is in this thread. Personally, I think that if
Cygwin fixes the problem it's easier to recompile all those binaries than try
to locate all potential source calling processes to make sure they follow
the right algorithm (Go isn't right, what about Node, Python, etc...) But
I'm not going to push on this point as I can work around it for my case.

Thanks,
Stephen

-----Original Message-----
From: Andrey Repin <[hidden email]>
Sent: Friday, September 6, 2019 1:35 PM
To: Stephen Provine <[hidden email]>; [hidden email]
Subject: Re: Command line processing in dcrt0.cc does not match Microsoft parsing rules

Greetings, Stephen Provine!

> On 2019-09-04 23:29, Brian Inglis wrote:
>> As standard on Unix systems, just add another level of quoting for
>> each level of interpretation, as bash will process that command line,
>> then bash will process the script command line.

> My mistake - I'm very aware of the quoting rules, yet in my test
> script for this scenario I forgot to quote the arguments. However, if
> POSIX rules are being implemented, there is still something I didn't expect. Here's my bash script:

> #!/bin/bash
> echo "$1"
> echo "$2"
> echo "$3"

> And I invoke it like this from a Windows command prompt:

> C:\> bash -x script.sh foo bar\"baz bat
> + echo foo
> foo
> + echo 'bar\baz bat'
> bar\baz bat
> + echo ''

> Not expected. Called from within Cygwin, the behavior is correct:

Again, fully expected.

> $ bash -x script.sh foo bar\"baz bat
> + echo foo
> foo
> + echo 'bar"baz'
> bar"baz
> + echo bat
> bat

> Can you explain this difference?

CMD escape character is ^, not \

> The reason I ask is that if this worked, the way Go constructs the
> command line string would be just fine.

No.


--
With best regards,
Andrey Repin
Friday, September 6, 2019 23:33:46

Sorry for my terrible english...


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

12