bug report: shell expansion in argv[] processing sensitive to LANG, e.g. "ls: cannot access '*.pdf': No such file or directory", but works okay in bash

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

bug report: shell expansion in argv[] processing sensitive to LANG, e.g. "ls: cannot access '*.pdf': No such file or directory", but works okay in bash

Cygwin list mailing list
Hi Cygwin team,
Here is a consolidated bug report based on the discussion in recent days which I'd started under the subject " shell expansion produces e.g. "ls: cannot access '*.pdf': No such file or directory" in Windows CMD shell, but works okay in bash " (thread starter https://cygwin.com/pipermail/cygwin/2020-March/244161.html )
Many thanks to Paul, Andrey, and others for helping me nail down where and how it seems to be happening.
My apologies in advance that my coding days are long behind me, so I'm not in a position to include a proposed code fix.

cygcheck output attached (lightly modified to redact a couple of personal items).

Problem:
Under certain circumstances (see Steps to Reproduce, below) Cygwin programs' built-in argv[] globbing will produce unexpected:
"{programName}: cannot access '{glob pattern}: No such file or directory"
e.g.
"ls: cannot access '*.pdf': No such file or directory"
.. despite the fact that e.g. *.pdf definitely exists.

Steps to Reproduce:
* Have some files in the local director with accented characters in the names, e.g.:
C:> mkdir c:\temp\test
C:> cd c:\temp\test
C:> touch héllo.pdf
C:> touch gòodbye.pdf
C:> touch normal.pdf
* DON'T have the LANG= environment variable set to anything
* NOT in bash or Cygwin Terminal, but rather within Windows CMD.exe, execute a Cygwin command which needs to do file name globbing because the Windows CMD.exe shells does not do so for it, e.g.
C:> ls *.pdf
C:> cat *.pdf
These will produce "ls: cannot access '*.pdf': No such file or directory"
Although, curiously,
C:> ls *or*
does correctly produce:
normal.pdf

Also, display output of the áccènted characters is incomplete:
C:> ls
'g'$'\303\262''odbye.pdf'  'h'$'\303\251''llo.pdf'   normal.pdf
C:> bash
jay_l@DESKTOP-I9MRIE3 /cygdrive/c/Temp
$ ls
'g'$'\303\262''odbye.pdf'  'h'$'\303\251''llo.pdf'   normal.pdf


Analysis:
I've verified that it's not about case sensitivity. That is, it's not a matter of ls *.pdf vs. ls *.PDF.
If these test commands are run either under bash.exe or within a Cygwin Terminal window, the problem does not occur.
I've verified that the Windows system locale (per Windows' Region setting) actually doesn't matter. (I've reproduced this both on systems in Region Spain with language English-International and English-Ireland, and in a VM with a bog standard vanilla US English Windows).

Credits to Paul for suggesting deleting files one by one until the problem goes away, and to Andrey for pointing out `locale` and the LANG= setting.

Set LANG=en_US.UTF-8, e.g.
C:> set LANG=en_US.UTF-8
.. and the problem goes away.
C:> ls *.pdf
gòodbye.pdf
héllo.pdf
normal.pdf
C:> ls
gòodbye.pdf
héllo.pdf
normal.pdf

Interestingly, Andrey mentioned that he sets LANG=ru_RU.CP866 and he doesn't see the problem. When I tried that exact setting, I still had the problem.
So it's maybe not just that LANG must be set to *something*, but that somehow LANG must be set to something that matches something in Windows? (Sorry, I know that's nearly uselessly vague).


In summary, it appears that the way that the argv[] globbing code which gets compiled in to Cygwin programs functions a bit differently than the way the shell globbing code works within bash.exe.
And this produces unexpected globbing failures.


Thanks to all the Cygwin maintainers for this amazing software, for so many years!
-Jay



--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

cygcheck.out (80K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: bug report: shell expansion in argv[] processing sensitive to LANG, e.g. "ls: cannot access '*.pdf': No such file or directory", but works okay in bash

Thomas Wolff
Am 24.03.2020 um 08:18 schrieb Jay Libove via Cygwin:

> Hi Cygwin team,
> Here is a consolidated bug report based on the discussion in recent days which I'd started under the subject " shell expansion produces e.g. "ls: cannot access '*.pdf': No such file or directory" in Windows CMD shell, but works okay in bash " (thread starter https://cygwin.com/pipermail/cygwin/2020-March/244161.html )
> Many thanks to Paul, Andrey, and others for helping me nail down where and how it seems to be happening.
> My apologies in advance that my coding days are long behind me, so I'm not in a position to include a proposed code fix.
>
> cygcheck output attached (lightly modified to redact a couple of personal items).
>
> Problem:
> Under certain circumstances (see Steps to Reproduce, below) Cygwin programs' built-in argv[] globbing will produce unexpected:
> "{programName}: cannot access '{glob pattern}: No such file or directory"
> e.g.
> "ls: cannot access '*.pdf': No such file or directory"
> .. despite the fact that e.g. *.pdf definitely exists.
>
> Steps to Reproduce:
> * Have some files in the local director with accented characters in the names, e.g.:
> C:> mkdir c:\temp\test
> C:> cd c:\temp\test
> C:> touch héllo.pdf
> C:> touch gòodbye.pdf
> C:> touch normal.pdf
> * DON'T have the LANG= environment variable set to anything
> * NOT in bash or Cygwin Terminal, but rather within Windows CMD.exe, execute a Cygwin command which needs to do file name globbing because the Windows CMD.exe shells does not do so for it, e.g.
> C:> ls *.pdf
> C:> cat *.pdf
> These will produce "ls: cannot access '*.pdf': No such file or directory"
> Although, curiously,
> C:> ls *or*
> does correctly produce:
> normal.pdf
>
> Also, display output of the áccènted characters is incomplete:
> C:> ls
> 'g'$'\303\262''odbye.pdf'  'h'$'\303\251''llo.pdf'   normal.pdf
> C:> bash
> jay_l@DESKTOP-I9MRIE3 /cygdrive/c/Temp
> $ ls
> 'g'$'\303\262''odbye.pdf'  'h'$'\303\251''llo.pdf'   normal.pdf
>
>
> Analysis:
> I've verified that it's not about case sensitivity. That is, it's not a matter of ls *.pdf vs. ls *.PDF.
> If these test commands are run either under bash.exe or within a Cygwin Terminal window, the problem does not occur.
> I've verified that the Windows system locale (per Windows' Region setting) actually doesn't matter. (I've reproduced this both on systems in Region Spain with language English-International and English-Ireland, and in a VM with a bog standard vanilla US English Windows).
>
> Credits to Paul for suggesting deleting files one by one until the problem goes away, and to Andrey for pointing out `locale` and the LANG= setting.
>
> Set LANG=en_US.UTF-8, e.g.
> C:> set LANG=en_US.UTF-8
> .. and the problem goes away.
> C:> ls *.pdf
> gòodbye.pdf
> héllo.pdf
> normal.pdf
> C:> ls
> gòodbye.pdf
> héllo.pdf
> normal.pdf
>
> Interestingly, Andrey mentioned that he sets LANG=ru_RU.CP866 and he doesn't see the problem. When I tried that exact setting, I still had the problem.
> So it's maybe not just that LANG must be set to *something*, but that somehow LANG must be set to something that matches something in Windows? (Sorry, I know that's nearly uselessly vague).
>
>
> In summary, it appears that the way that the argv[] globbing code which gets compiled in to Cygwin programs functions a bit differently than the way the shell globbing code works within bash.exe.
> And this produces unexpected globbing failures.
(As commented in the other thread already:)
Maybe it can simply be fixed by changing the order of setting up locale
stuff and applying the expansion in cygwin?
(I would look into the code if I had a clue where to find the respective
things.)
Thomas


> Thanks to all the Cygwin maintainers for this amazing software, for so many years!
> -Jay
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
Reply | Threaded
Open this post in threaded view
|

Re: bug report: shell expansion in argv[] processing sensitive to LANG, e.g. "ls: cannot access '*.pdf': No such file or directory", but works okay in bash

Mark Geisert

> Maybe it can simply be fixed by changing the order of setting up locale stuff
> and applying the expansion in cygwin?
> (I would look into the code if I had a clue where to find the respective things.)

I would guess dcrt0.cc, the Cygwin DLL runtime initialization.

..mark
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
Reply | Threaded
Open this post in threaded view
|

Re: bug report: shell expansion in argv[] processing sensitive to LANG, e.g. "ls: cannot access '*.pdf': No such file or directory", but works okay in bash

L A Walsh
In reply to this post by Cygwin list mailing list
On 2020/03/24 00:18, Jay Libove via Cygwin wrote:
> Problem:
> Under certain circumstances (see Steps to Reproduce, below) Cygwin programs' built-in argv[] globbing will produce unexpected:
> "{programName}: cannot access '{glob pattern}: No such file or directory"
> e.g.
> "ls: cannot access '*.pdf': No such file or directory"
> .. despite the fact that e.g. *.pdf definitely exists.
>  
----
    This isn't a bug or a problem, it is working normally as expected.
Cygwin programs don't have built-in argv[] globbing or processing.

    The problem you are seeing is because you are calling cygwin programs
from a windows shell.

    On windows, every program has to be built with glob processing.

    On unix, glob processing happens in the shell, so all unix
(linux+cygwin)
type programs have no glob processing because they know that globbing is
built
into the shell (like bash or csh, or dash, etc).

If you run 'ls' *.pdf in bash, bash expands the *.pdf into arguments
that don't contain a glob (if the glob matches a file).  So 'ls' sees
only fixed filenames and no globs.

When you run 'ls from the Windows shell, Windows cmd.exe doesn't expand
glob chars into anything.  so 'ls' sees a literal file name of '*.pdf'.

On linux you can name a file '*.pdf' (using an asterisk as a valid
character).
Unless you have a file named, literally '*.pdf', ls won't see it.

Cygwin does simulate this: example:
>  cd /tmp
/tmp> touch \*.pdf
/tmp> ls *.pdf
*.pdf
/tmp cmd
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\tmp>ls *.pdf
ls *.pdf
'*.pdf'

^^ note that now windows find *.pdf because there is a file named '*.pdf'
(quotes added by 'ls').

Does this explain your issue, or am I not understanding it?

Thanks (I'm not a cygwin author; just answering the question)
Linda

> Steps to Reproduce:
> * Have some files in the local director with accented characters in the names, e.g.:
> C:> mkdir c:\temp\test
> C:> cd c:\temp\test
> C:> touch h�llo.pdf
> C:> touch g�odbye.pdf
> C:> touch normal.pdf
> * DON'T have the LANG= environment variable set to anything
> * NOT in bash or Cygwin Terminal, but rather within Windows CMD.exe, execute a Cygwin command which needs to do file name globbing because the Windows CMD.exe shells does not do so for it, e.g.
> C:> ls *.pdf
> C:> cat *.pdf
> These will produce "ls: cannot access '*.pdf': No such file or directory"
> Although, curiously,
> C:> ls *or*
> does correctly produce:
> normal.pdf
>
> Also, display output of the �cc�nted characters is incomplete:
> C:> ls
> 'g'$'\303\262''odbye.pdf'  'h'$'\303\251''llo.pdf'   normal.pdf
> C:> bash
> jay_l@DESKTOP-I9MRIE3 /cygdrive/c/Temp
> $ ls
> 'g'$'\303\262''odbye.pdf'  'h'$'\303\251''llo.pdf'   normal.pdf
>
>
> Analysis:
> I've verified that it's not about case sensitivity. That is, it's not a matter of ls *.pdf vs. ls *.PDF.
> If these test commands are run either under bash.exe or within a Cygwin Terminal window, the problem does not occur.
> I've verified that the Windows system locale (per Windows' Region setting) actually doesn't matter. (I've reproduced this both on systems in Region Spain with language English-International and English-Ireland, and in a VM with a bog standard vanilla US English Windows).
>
> Credits to Paul for suggesting deleting files one by one until the problem goes away, and to Andrey for pointing out `locale` and the LANG= setting.
>
> Set LANG=en_US.UTF-8, e.g.
> C:> set LANG=en_US.UTF-8
> .. and the problem goes away.
> C:> ls *.pdf
> g�odbye.pdf
> h�llo.pdf
> normal.pdf
> C:> ls
> g�odbye.pdf
> h�llo.pdf
> normal.pdf
>
> Interestingly, Andrey mentioned that he sets LANG=ru_RU.CP866 and he doesn't see the problem. When I tried that exact setting, I still had the problem.
> So it's maybe not just that LANG must be set to *something*, but that somehow LANG must be set to something that matches something in Windows? (Sorry, I know that's nearly uselessly vague).
>
>
> In summary, it appears that the way that the argv[] globbing code which gets compiled in to Cygwin programs functions a bit differently than the way the shell globbing code works within bash.exe.
> And this produces unexpected globbing failures.
>
>
> Thanks to all the Cygwin maintainers for this amazing software, for so many years!
> -Jay
>
>
>  
> ------------------------------------------------------------------------
>
> --
> Problem reports:      https://cygwin.com/problems.html
> FAQ:                  https://cygwin.com/faq/
> Documentation:        https://cygwin.com/docs.html
> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
>  

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
Reply | Threaded
Open this post in threaded view
|

Re: bug report: shell expansion in argv[] processing sensitive to LANG, e.g. "ls: cannot access '*.pdf': No such file or directory", but works okay in bash

Andrey Repin
Greetings, L A Walsh!

> On 2020/03/24 00:18, Jay Libove via Cygwin wrote:
>> Problem:
>> Under certain circumstances (see Steps to Reproduce, below) Cygwin programs' built-in argv[] globbing will produce unexpected:
>> "{programName}: cannot access '{glob pattern}: No such file or directory"
>> e.g.
>> "ls: cannot access '*.pdf': No such file or directory"
>> .. despite the fact that e.g. *.pdf definitely exists.
>>  
> ----
>     This isn't a bug or a problem, it is working normally as expected.
> Cygwin programs don't have built-in argv[] globbing or processing.

>     The problem you are seeing is because you are calling cygwin programs
> from a windows shell.

>     On windows, every program has to be built with glob processing.

>     On unix, glob processing happens in the shell, so all unix
> (linux+cygwin)
> type programs have no glob processing because they know that globbing is
> built
> into the shell (like bash or csh, or dash, etc).

> If you run 'ls' *.pdf in bash, bash expands the *.pdf into arguments
> that don't contain a glob (if the glob matches a file).  So 'ls' sees
> only fixed filenames and no globs.

> When you run 'ls from the Windows shell, Windows cmd.exe doesn't expand
> glob chars into anything.  so 'ls' sees a literal file name of '*.pdf'.

> On linux you can name a file '*.pdf' (using an asterisk as a valid
> character).
> Unless you have a file named, literally '*.pdf', ls won't see it.

That's not what actually happens.

...\Documents> ls -1 *.pdf
21927-ticket.pdf
'Stars! Universe Map.pdf'


--
With best regards,
Andrey Repin
Thursday, April 2, 2020 15:51:26

Sorry for my terrible english...

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
Reply | Threaded
Open this post in threaded view
|

Re: bug report: shell expansion in argv[] processing sensitive to LANG, e.g. "ls: cannot access '*.pdf': No such file or directory", but works okay in bash

L A Walsh


On 2020/04/02 06:43, Andrey Repin wrote:
> That's not what actually happens.
>
> ...\Documents> ls -1 *.pdf
> 21927-ticket.pdf
> 'Stars! Universe Map.pdf'
---
Thank you for your update.
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple