[PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

cygwin-patches mailing list
- In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
  for the case that the multibyte char is splitted in the middle.
  The reason is as follows.
  * ISO-2022 is too complicated to handle correctly.
  * Not sure what to do with ISCII.
---
 winsup/cygwin/fhandler_tty.cc | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
index 37d033bbe..ee5c6a90a 100644
--- a/winsup/cygwin/fhandler_tty.cc
+++ b/winsup/cygwin/fhandler_tty.cc
@@ -117,6 +117,9 @@ CreateProcessW_Hooked
   return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
 }
 
+#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )
+#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
+
 static void
 convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
  UINT cp_from, const char *ptr_from, size_t len_from,
@@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
   tmp_pathbuf tp;
   wchar_t *wbuf = tp.w_get ();
   int wlen = 0;
-  if (cp_from == CP_UTF7)
-    /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
+  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
+    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
+       - ISO-2022 is too complicated to handle correctly.
+       - FIXME: Not sure what to do for ISCII.
        Therefore, just convert string without checking */
     wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
  wbuf, NT_MAX_PATH);
--
2.28.0

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

Corinna Vinschen-2
On Sep 11 19:54, Takashi Yano via Cygwin-patches wrote:

> - In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
>   for the case that the multibyte char is splitted in the middle.
>   The reason is as follows.
>   * ISO-2022 is too complicated to handle correctly.
>   * Not sure what to do with ISCII.
> ---
>  winsup/cygwin/fhandler_tty.cc | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> index 37d033bbe..ee5c6a90a 100644
> --- a/winsup/cygwin/fhandler_tty.cc
> +++ b/winsup/cygwin/fhandler_tty.cc
> @@ -117,6 +117,9 @@ CreateProcessW_Hooked
>    return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
>  }
>  
> +#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )
> +#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
> +
>  static void
>  convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
>   UINT cp_from, const char *ptr_from, size_t len_from,
> @@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
>    tmp_pathbuf tp;
>    wchar_t *wbuf = tp.w_get ();
>    int wlen = 0;
> -  if (cp_from == CP_UTF7)
> -    /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> +       - ISO-2022 is too complicated to handle correctly.
> +       - FIXME: Not sure what to do for ISCII.
>         Therefore, just convert string without checking */
>      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
>   wbuf, NT_MAX_PATH);
> --
> 2.28.0

I'd prefer to not handle them at all.  We just don't support these
charsets, same as JIS, EBCDIC, you name it, which are not ASCII
compatible.  Let's please just drop any handling for these weird
or outdated codepages.


Thanks,
Corinna
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

cygwin-patches mailing list
Hi Corinna,

On Fri, 11 Sep 2020 14:08:40 +0200
Corinna Vinschen wrote:

> On Sep 11 19:54, Takashi Yano via Cygwin-patches wrote:
> > - In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
> >   for the case that the multibyte char is splitted in the middle.
> >   The reason is as follows.
> >   * ISO-2022 is too complicated to handle correctly.
> >   * Not sure what to do with ISCII.
> > ---
> >  winsup/cygwin/fhandler_tty.cc | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> > index 37d033bbe..ee5c6a90a 100644
> > --- a/winsup/cygwin/fhandler_tty.cc
> > +++ b/winsup/cygwin/fhandler_tty.cc
> > @@ -117,6 +117,9 @@ CreateProcessW_Hooked
> >    return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
> >  }
> >  
> > +#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )
> > +#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
> > +
> >  static void
> >  convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> >   UINT cp_from, const char *ptr_from, size_t len_from,
> > @@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> >    tmp_pathbuf tp;
> >    wchar_t *wbuf = tp.w_get ();
> >    int wlen = 0;
> > -  if (cp_from == CP_UTF7)
> > -    /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > +       - ISO-2022 is too complicated to handle correctly.
> > +       - FIXME: Not sure what to do for ISCII.
> >         Therefore, just convert string without checking */
> >      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> >   wbuf, NT_MAX_PATH);
> > --
> > 2.28.0
>
> I'd prefer to not handle them at all.  We just don't support these
> charsets, same as JIS, EBCDIC, you name it, which are not ASCII
> compatible.  Let's please just drop any handling for these weird
> or outdated codepages.

What do you mean by "just drop any handling"?

Do you mean remove following if block?
> > +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > +       - ISO-2022 is too complicated to handle correctly.
> > +       - FIXME: Not sure what to do for ISCII.
> >         Therefore, just convert string without checking */
> >      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> >   wbuf, NT_MAX_PATH);
In this case, the conversion for ISO-2022, ISCII and UTF-7 will
not be done correctly.

Or skip charset conversion if the codepage is EBCDIC, ISO-2022
or ISCII? What should we do for UTF-7?

What should happen if user or apps chage codepage to one of them?

--
Takashi Yano <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

Corinna Vinschen-2
On Sep 11 21:35, Takashi Yano via Cygwin-patches wrote:

> Hi Corinna,
>
> On Fri, 11 Sep 2020 14:08:40 +0200
> Corinna Vinschen wrote:
> > On Sep 11 19:54, Takashi Yano via Cygwin-patches wrote:
> > > - In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
> > >   for the case that the multibyte char is splitted in the middle.
> > >   The reason is as follows.
> > >   * ISO-2022 is too complicated to handle correctly.
> > >   * Not sure what to do with ISCII.
> > > ---
> > >  winsup/cygwin/fhandler_tty.cc | 9 +++++++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> > > index 37d033bbe..ee5c6a90a 100644
> > > --- a/winsup/cygwin/fhandler_tty.cc
> > > +++ b/winsup/cygwin/fhandler_tty.cc
> > > @@ -117,6 +117,9 @@ CreateProcessW_Hooked
> > >    return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
> > >  }
> > >  
> > > +#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )
> > > +#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
> > > +
> > >  static void
> > >  convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> > >   UINT cp_from, const char *ptr_from, size_t len_from,
> > > @@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> > >    tmp_pathbuf tp;
> > >    wchar_t *wbuf = tp.w_get ();
> > >    int wlen = 0;
> > > -  if (cp_from == CP_UTF7)
> > > -    /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > > +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > +       - ISO-2022 is too complicated to handle correctly.
> > > +       - FIXME: Not sure what to do for ISCII.
> > >         Therefore, just convert string without checking */
> > >      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > >   wbuf, NT_MAX_PATH);
> > > --
> > > 2.28.0
> >
> > I'd prefer to not handle them at all.  We just don't support these
> > charsets, same as JIS, EBCDIC, you name it, which are not ASCII
> > compatible.  Let's please just drop any handling for these weird
> > or outdated codepages.
>
> What do you mean by "just drop any handling"?
>
> Do you mean remove following if block?
> > > +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > > +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > +       - ISO-2022 is too complicated to handle correctly.
> > > +       - FIXME: Not sure what to do for ISCII.
> > >         Therefore, just convert string without checking */
> > >      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > >   wbuf, NT_MAX_PATH);
> In this case, the conversion for ISO-2022, ISCII and UTF-7 will
> not be done correctly.
>
> Or skip charset conversion if the codepage is EBCDIC, ISO-2022
> or ISCII? What should we do for UTF-7?

Nothing, just like for any other of these weird charsets.  Cygwin never
supported any charset which wasn't at least ASCII compatible in the
0 <= x <= 127 range.  Just ignore them and the possibility that a
user chooses them for fun.

> What should happen if user or apps chage codepage to one of them?

Garbage output, I guess.  We shouldn't really care.


Corinna
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

Thomas Wolff
Am 11.09.2020 um 16:06 schrieb Corinna Vinschen:

> On Sep 11 21:35, Takashi Yano via Cygwin-patches wrote:
>> Hi Corinna,
>>
>> On Fri, 11 Sep 2020 14:08:40 +0200
>> Corinna Vinschen wrote:
>>> On Sep 11 19:54, Takashi Yano via Cygwin-patches wrote:
>>>> - In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
>>>>    for the case that the multibyte char is splitted in the middle.
>>>>    The reason is as follows.
>>>>    * ISO-2022 is too complicated to handle correctly.
>>>>    * Not sure what to do with ISCII.
>>>> ---
>>>>   winsup/cygwin/fhandler_tty.cc | 9 +++++++--
>>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
>>>> index 37d033bbe..ee5c6a90a 100644
>>>> --- a/winsup/cygwin/fhandler_tty.cc
>>>> +++ b/winsup/cygwin/fhandler_tty.cc
>>>> @@ -117,6 +117,9 @@ CreateProcessW_Hooked
>>>>     return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
>>>>   }
>>>>  
>>>> +#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )
>>>> +#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
>>>> +
>>>>   static void
>>>>   convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
>>>>   UINT cp_from, const char *ptr_from, size_t len_from,
>>>> @@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
>>>>     tmp_pathbuf tp;
>>>>     wchar_t *wbuf = tp.w_get ();
>>>>     int wlen = 0;
>>>> -  if (cp_from == CP_UTF7)
>>>> -    /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
>>>> +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
>>>> +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
>>>> +       - ISO-2022 is too complicated to handle correctly.
>>>> +       - FIXME: Not sure what to do for ISCII.
>>>>          Therefore, just convert string without checking */
>>>>       wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
>>>>   wbuf, NT_MAX_PATH);
>>>> --
>>>> 2.28.0
>>> I'd prefer to not handle them at all.  We just don't support these
>>> charsets, same as JIS, EBCDIC, you name it, which are not ASCII
>>> compatible.  Let's please just drop any handling for these weird
>>> or outdated codepages.
>> What do you mean by "just drop any handling"?
>>
>> Do you mean remove following if block?
>>>> +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
>>>> +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
>>>> +       - ISO-2022 is too complicated to handle correctly.
>>>> +       - FIXME: Not sure what to do for ISCII.
>>>>          Therefore, just convert string without checking */
>>>>       wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
>>>>   wbuf, NT_MAX_PATH);
>> In this case, the conversion for ISO-2022, ISCII and UTF-7 will
>> not be done correctly.
>>
>> Or skip charset conversion if the codepage is EBCDIC, ISO-2022
>> or ISCII? What should we do for UTF-7?
> Nothing, just like for any other of these weird charsets.  Cygwin never
> supported any charset which wasn't at least ASCII compatible in the
> 0 <= x <= 127 range.
Actually, in Shift-JIS (CP932, supported via locale ja_JP.sjis), 0x5C is
¥ :/
>    Just ignore them and the possibility that a
> user chooses them for fun.
>
>> What should happen if user or apps chage codepage to one of them?
> Garbage output, I guess.  We shouldn't really care.
>
>
> Corinna

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

Thomas Wolff
Am 11.09.2020 um 17:10 schrieb Thomas Wolff:

> Am 11.09.2020 um 16:06 schrieb Corinna Vinschen:
>> On Sep 11 21:35, Takashi Yano via Cygwin-patches wrote:
>>> Hi Corinna,
>>>
>>> On Fri, 11 Sep 2020 14:08:40 +0200
>>> Corinna Vinschen wrote:
>>>> On Sep 11 19:54, Takashi Yano via Cygwin-patches wrote:
>>>>> - In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
>>>>>    for the case that the multibyte char is splitted in the middle.
>>>>>    The reason is as follows.
>>>>>    * ISO-2022 is too complicated to handle correctly.
>>>>>    * Not sure what to do with ISCII.
>>>>> ---
>>>>>   winsup/cygwin/fhandler_tty.cc | 9 +++++++--
>>>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/winsup/cygwin/fhandler_tty.cc
>>>>> b/winsup/cygwin/fhandler_tty.cc
>>>>> index 37d033bbe..ee5c6a90a 100644
>>>>> --- a/winsup/cygwin/fhandler_tty.cc
>>>>> +++ b/winsup/cygwin/fhandler_tty.cc
>>>>> @@ -117,6 +117,9 @@ CreateProcessW_Hooked
>>>>>     return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
>>>>>   }
>>>>>   +#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )
>>>>> +#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
>>>>> +
>>>>>   static void
>>>>>   convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
>>>>>           UINT cp_from, const char *ptr_from, size_t len_from,
>>>>> @@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to,
>>>>> size_t *len_to,
>>>>>     tmp_pathbuf tp;
>>>>>     wchar_t *wbuf = tp.w_get ();
>>>>>     int wlen = 0;
>>>>> -  if (cp_from == CP_UTF7)
>>>>> -    /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
>>>>> +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII
>>>>> (cp_from))
>>>>> +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
>>>>> +       - ISO-2022 is too complicated to handle correctly.
>>>>> +       - FIXME: Not sure what to do for ISCII.
>>>>>          Therefore, just convert string without checking */
>>>>>       wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
>>>>>                   wbuf, NT_MAX_PATH);
>>>>> --
>>>>> 2.28.0
>>>> I'd prefer to not handle them at all.  We just don't support these
>>>> charsets, same as JIS, EBCDIC, you name it, which are not ASCII
>>>> compatible.  Let's please just drop any handling for these weird
>>>> or outdated codepages.
>>> What do you mean by "just drop any handling"?
>>>
>>> Do you mean remove following if block?
>>>>> +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII
>>>>> (cp_from))
>>>>> +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
>>>>> +       - ISO-2022 is too complicated to handle correctly.
>>>>> +       - FIXME: Not sure what to do for ISCII.
>>>>>          Therefore, just convert string without checking */
>>>>>       wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
>>>>>                   wbuf, NT_MAX_PATH);
>>> In this case, the conversion for ISO-2022, ISCII and UTF-7 will
>>> not be done correctly.
>>>
>>> Or skip charset conversion if the codepage is EBCDIC, ISO-2022
>>> or ISCII? What should we do for UTF-7?
>> Nothing, just like for any other of these weird charsets. Cygwin never
>> supported any charset which wasn't at least ASCII compatible in the
>> 0 <= x <= 127 range.
> Actually, in Shift-JIS (CP932, supported via locale ja_JP.sjis), 0x5C
> is ¥ :/
... or maybe not, as explained in
https://en.wikipedia.org/wiki/Code_page_932_(Microsoft_Windows)#Single-byte_character_differences.
Terrible.
>>    Just ignore them and the possibility that a
>> user chooses them for fun.
>>
>>> What should happen if user or apps chage codepage to one of them?
>> Garbage output, I guess.  We shouldn't really care.
>>
>>
>> Corinna
>

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

cygwin-patches mailing list
In reply to this post by Corinna Vinschen-2
On Fri, 11 Sep 2020 16:06:01 +0200
Corinna Vinschen wrote:

> On Sep 11 21:35, Takashi Yano via Cygwin-patches wrote:
> > Hi Corinna,
> >
> > On Fri, 11 Sep 2020 14:08:40 +0200
> > Corinna Vinschen wrote:
> > > On Sep 11 19:54, Takashi Yano via Cygwin-patches wrote:
> > > > - In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
> > > >   for the case that the multibyte char is splitted in the middle.
> > > >   The reason is as follows.
> > > >   * ISO-2022 is too complicated to handle correctly.
> > > >   * Not sure what to do with ISCII.
> > > > ---
> > > >  winsup/cygwin/fhandler_tty.cc | 9 +++++++--
> > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> > > > index 37d033bbe..ee5c6a90a 100644
> > > > --- a/winsup/cygwin/fhandler_tty.cc
> > > > +++ b/winsup/cygwin/fhandler_tty.cc
> > > > @@ -117,6 +117,9 @@ CreateProcessW_Hooked
> > > >    return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
> > > >  }
> > > >  
> > > > +#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )
> > > > +#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
> > > > +
> > > >  static void
> > > >  convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> > > >   UINT cp_from, const char *ptr_from, size_t len_from,
> > > > @@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> > > >    tmp_pathbuf tp;
> > > >    wchar_t *wbuf = tp.w_get ();
> > > >    int wlen = 0;
> > > > -  if (cp_from == CP_UTF7)
> > > > -    /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > > > +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > +       - ISO-2022 is too complicated to handle correctly.
> > > > +       - FIXME: Not sure what to do for ISCII.
> > > >         Therefore, just convert string without checking */
> > > >      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > > >   wbuf, NT_MAX_PATH);
> > > > --
> > > > 2.28.0
> > >
> > > I'd prefer to not handle them at all.  We just don't support these
> > > charsets, same as JIS, EBCDIC, you name it, which are not ASCII
> > > compatible.  Let's please just drop any handling for these weird
> > > or outdated codepages.
> >
> > What do you mean by "just drop any handling"?
> >
> > Do you mean remove following if block?
> > > > +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > > > +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > +       - ISO-2022 is too complicated to handle correctly.
> > > > +       - FIXME: Not sure what to do for ISCII.
> > > >         Therefore, just convert string without checking */
> > > >      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > > >   wbuf, NT_MAX_PATH);
> > In this case, the conversion for ISO-2022, ISCII and UTF-7 will
> > not be done correctly.
> >
> > Or skip charset conversion if the codepage is EBCDIC, ISO-2022
> > or ISCII? What should we do for UTF-7?
>
> Nothing, just like for any other of these weird charsets.  Cygwin never
> supported any charset which wasn't at least ASCII compatible in the
> 0 <= x <= 127 range.  Just ignore them and the possibility that a
> user chooses them for fun.
>
> > What should happen if user or apps chage codepage to one of them?
>
> Garbage output, I guess.  We shouldn't really care.
Do you mean a patch attached?

Please try:
(1) Open mintty with "env CYGWIN=disable_pcon mintty".
(2) Start cmd.exe in that mintty.
(3) Try chcp such as
    37 (EBCDIC),
    65000 (UTF-7),
    50220 (ISO-2022),
    and 57002 (ISCII).
(4) Execute dir or some other commands in cmd.exe.

For 65000, 50220 adn 57002, even the prompt will be broken.
Are the results as you expected?

If pseudo console is enabled, all the above are work without
problem. With the previous patch, the results was sane even
if pseudo console is disabled.

--
Takashi Yano <[hidden email]>

0001-Cygwin-pty-Drop-handling-for-UTF-7-in-convert_mb_str.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

cygwin-patches mailing list
On Sat, 12 Sep 2020 01:05:04 +0900
Takashi Yano via Cygwin-patches <[hidden email]> wrote:

> On Fri, 11 Sep 2020 16:06:01 +0200
> Corinna Vinschen wrote:
> > On Sep 11 21:35, Takashi Yano via Cygwin-patches wrote:
> > > Hi Corinna,
> > >
> > > On Fri, 11 Sep 2020 14:08:40 +0200
> > > Corinna Vinschen wrote:
> > > > On Sep 11 19:54, Takashi Yano via Cygwin-patches wrote:
> > > > > - In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
> > > > >   for the case that the multibyte char is splitted in the middle.
> > > > >   The reason is as follows.
> > > > >   * ISO-2022 is too complicated to handle correctly.
> > > > >   * Not sure what to do with ISCII.
> > > > > ---
> > > > >  winsup/cygwin/fhandler_tty.cc | 9 +++++++--
> > > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> > > > > index 37d033bbe..ee5c6a90a 100644
> > > > > --- a/winsup/cygwin/fhandler_tty.cc
> > > > > +++ b/winsup/cygwin/fhandler_tty.cc
> > > > > @@ -117,6 +117,9 @@ CreateProcessW_Hooked
> > > > >    return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
> > > > >  }
> > > > >  
> > > > > +#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )
> > > > > +#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
> > > > > +
> > > > >  static void
> > > > >  convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> > > > >   UINT cp_from, const char *ptr_from, size_t len_from,
> > > > > @@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> > > > >    tmp_pathbuf tp;
> > > > >    wchar_t *wbuf = tp.w_get ();
> > > > >    int wlen = 0;
> > > > > -  if (cp_from == CP_UTF7)
> > > > > -    /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > > +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > > > > +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > > +       - ISO-2022 is too complicated to handle correctly.
> > > > > +       - FIXME: Not sure what to do for ISCII.
> > > > >         Therefore, just convert string without checking */
> > > > >      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > > > >   wbuf, NT_MAX_PATH);
> > > > > --
> > > > > 2.28.0
> > > >
> > > > I'd prefer to not handle them at all.  We just don't support these
> > > > charsets, same as JIS, EBCDIC, you name it, which are not ASCII
> > > > compatible.  Let's please just drop any handling for these weird
> > > > or outdated codepages.
> > >
> > > What do you mean by "just drop any handling"?
> > >
> > > Do you mean remove following if block?
> > > > > +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > > > > +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > > +       - ISO-2022 is too complicated to handle correctly.
> > > > > +       - FIXME: Not sure what to do for ISCII.
> > > > >         Therefore, just convert string without checking */
> > > > >      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > > > >   wbuf, NT_MAX_PATH);
> > > In this case, the conversion for ISO-2022, ISCII and UTF-7 will
> > > not be done correctly.
> > >
> > > Or skip charset conversion if the codepage is EBCDIC, ISO-2022
> > > or ISCII? What should we do for UTF-7?
> >
> > Nothing, just like for any other of these weird charsets.  Cygwin never
> > supported any charset which wasn't at least ASCII compatible in the
> > 0 <= x <= 127 range.  Just ignore them and the possibility that a
> > user chooses them for fun.
> >
> > > What should happen if user or apps chage codepage to one of them?
> >
> > Garbage output, I guess.  We shouldn't really care.
>
> Do you mean a patch attached?
>
> Please try:
> (1) Open mintty with "env CYGWIN=disable_pcon mintty".
> (2) Start cmd.exe in that mintty.
> (3) Try chcp such as
>     37 (EBCDIC),
>     65000 (UTF-7),
>     50220 (ISO-2022),
>     and 57002 (ISCII).
> (4) Execute dir or some other commands in cmd.exe.
>
> For 65000, 50220 adn 57002, even the prompt will be broken.
> Are the results as you expected?
>
> If pseudo console is enabled, all the above are work without
> problem. With the previous patch, the results was sane even
> if pseudo console is disabled.
How about the patch attached?
I think this is safer than previous patch.

--
Takashi Yano <[hidden email]>

0001-Cygwin-pty-Skip-multibyte-char-boundary-check-condit.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

Corinna Vinschen-2
In reply to this post by cygwin-patches mailing list
On Sep 12 01:05, Takashi Yano via Cygwin-patches wrote:

> On Fri, 11 Sep 2020 16:06:01 +0200
> Corinna Vinschen wrote:
> > On Sep 11 21:35, Takashi Yano via Cygwin-patches wrote:
> > > What do you mean by "just drop any handling"?
> > >
> > > Do you mean remove following if block?
> > > > > +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > > > > +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > > +       - ISO-2022 is too complicated to handle correctly.
> > > > > +       - FIXME: Not sure what to do for ISCII.
> > > > >         Therefore, just convert string without checking */
> > > > >      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > > > >   wbuf, NT_MAX_PATH);
> > > In this case, the conversion for ISO-2022, ISCII and UTF-7 will
> > > not be done correctly.
> > >
> > > Or skip charset conversion if the codepage is EBCDIC, ISO-2022
> > > or ISCII? What should we do for UTF-7?
> >
> > Nothing, just like for any other of these weird charsets.  Cygwin never
> > supported any charset which wasn't at least ASCII compatible in the
> > 0 <= x <= 127 range.  Just ignore them and the possibility that a
> > user chooses them for fun.
> >
> > > What should happen if user or apps chage codepage to one of them?
> >
> > Garbage output, I guess.  We shouldn't really care.
>
> Do you mean a patch attached?

Yes.  I pushed it.  We should really not care for them.


Thanks,
Corinna
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

cygwin-patches mailing list
In reply to this post by cygwin-patches mailing list
On Sat, 12 Sep 2020 02:38:43 +0900
Takashi Yano via Cygwin-patches <[hidden email]> wrote:

> On Sat, 12 Sep 2020 01:05:04 +0900
> Takashi Yano via Cygwin-patches <[hidden email]> wrote:
> > On Fri, 11 Sep 2020 16:06:01 +0200
> > Corinna Vinschen wrote:
> > > On Sep 11 21:35, Takashi Yano via Cygwin-patches wrote:
> > > > Hi Corinna,
> > > >
> > > > On Fri, 11 Sep 2020 14:08:40 +0200
> > > > Corinna Vinschen wrote:
> > > > > On Sep 11 19:54, Takashi Yano via Cygwin-patches wrote:
> > > > > > - In convert_mb_str(), exclude ISO-2022 and ISCII from the processing
> > > > > >   for the case that the multibyte char is splitted in the middle.
> > > > > >   The reason is as follows.
> > > > > >   * ISO-2022 is too complicated to handle correctly.
> > > > > >   * Not sure what to do with ISCII.
> > > > > > ---
> > > > > >  winsup/cygwin/fhandler_tty.cc | 9 +++++++--
> > > > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/winsup/cygwin/fhandler_tty.cc b/winsup/cygwin/fhandler_tty.cc
> > > > > > index 37d033bbe..ee5c6a90a 100644
> > > > > > --- a/winsup/cygwin/fhandler_tty.cc
> > > > > > +++ b/winsup/cygwin/fhandler_tty.cc
> > > > > > @@ -117,6 +117,9 @@ CreateProcessW_Hooked
> > > > > >    return CreateProcessW_Orig (n, c, pa, ta, inh, f, e, d, si, pi);
> > > > > >  }
> > > > > >  
> > > > > > +#define IS_ISO_2022(x) ( (x) >= 50220 && (x) <= 50229 )
> > > > > > +#define IS_ISCII(x) ( (x) >= 57002 && (x) <= 57011 )
> > > > > > +
> > > > > >  static void
> > > > > >  convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> > > > > >   UINT cp_from, const char *ptr_from, size_t len_from,
> > > > > > @@ -126,8 +129,10 @@ convert_mb_str (UINT cp_to, char *ptr_to, size_t *len_to,
> > > > > >    tmp_pathbuf tp;
> > > > > >    wchar_t *wbuf = tp.w_get ();
> > > > > >    int wlen = 0;
> > > > > > -  if (cp_from == CP_UTF7)
> > > > > > -    /* MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > > > +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > > > > > +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > > > +       - ISO-2022 is too complicated to handle correctly.
> > > > > > +       - FIXME: Not sure what to do for ISCII.
> > > > > >         Therefore, just convert string without checking */
> > > > > >      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > > > > >   wbuf, NT_MAX_PATH);
> > > > > > --
> > > > > > 2.28.0
> > > > >
> > > > > I'd prefer to not handle them at all.  We just don't support these
> > > > > charsets, same as JIS, EBCDIC, you name it, which are not ASCII
> > > > > compatible.  Let's please just drop any handling for these weird
> > > > > or outdated codepages.
> > > >
> > > > What do you mean by "just drop any handling"?
> > > >
> > > > Do you mean remove following if block?
> > > > > > +  if (cp_from == CP_UTF7 || IS_ISO_2022 (cp_from) || IS_ISCII (cp_from))
> > > > > > +    /* - MB_ERR_INVALID_CHARS does not work properly for UTF-7.
> > > > > > +       - ISO-2022 is too complicated to handle correctly.
> > > > > > +       - FIXME: Not sure what to do for ISCII.
> > > > > >         Therefore, just convert string without checking */
> > > > > >      wlen = MultiByteToWideChar (cp_from, 0, ptr_from, len_from,
> > > > > >   wbuf, NT_MAX_PATH);
> > > > In this case, the conversion for ISO-2022, ISCII and UTF-7 will
> > > > not be done correctly.
> > > >
> > > > Or skip charset conversion if the codepage is EBCDIC, ISO-2022
> > > > or ISCII? What should we do for UTF-7?
> > >
> > > Nothing, just like for any other of these weird charsets.  Cygwin never
> > > supported any charset which wasn't at least ASCII compatible in the
> > > 0 <= x <= 127 range.  Just ignore them and the possibility that a
> > > user chooses them for fun.
> > >
> > > > What should happen if user or apps chage codepage to one of them?
> > >
> > > Garbage output, I guess.  We shouldn't really care.
> >
> > Do you mean a patch attached?
> >
> > Please try:
> > (1) Open mintty with "env CYGWIN=disable_pcon mintty".
> > (2) Start cmd.exe in that mintty.
> > (3) Try chcp such as
> >     37 (EBCDIC),
> >     65000 (UTF-7),
> >     50220 (ISO-2022),
> >     and 57002 (ISCII).
> > (4) Execute dir or some other commands in cmd.exe.
> >
> > For 65000, 50220 adn 57002, even the prompt will be broken.
> > Are the results as you expected?
> >
> > If pseudo console is enabled, all the above are work without
> > problem. With the previous patch, the results was sane even
> > if pseudo console is disabled.
>
> How about the patch attached?
> I think this is safer than previous patch.

I have revised this patch to fit current git head, and submit
to [hidden email].

--
Takashi Yano <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

Corinna Vinschen-2
On Sep 12 03:37, Takashi Yano via Cygwin-patches wrote:
> On Sat, 12 Sep 2020 02:38:43 +0900
> Takashi Yano via Cygwin-patches <[hidden email]> wrote:
> > How about the patch attached?
> > I think this is safer than previous patch.
>
> I have revised this patch to fit current git head, and submit
> to [hidden email].

Thanks, but I didn't apply this one because it doesn't really make sense
to go to the extra effort to support outdated and incompatible codepages
we don't handle as Cygwin codeset at all.  IMHO it's not worth to call
another MBTWC just to check if a codepage supports the MB_ERR_INVALID_CHARS
flag.


Corinna
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

cygwin-patches mailing list
On Fri, 11 Sep 2020 20:57:06 +0200
Corinna Vinschen wrote:

> On Sep 12 03:37, Takashi Yano via Cygwin-patches wrote:
> > On Sat, 12 Sep 2020 02:38:43 +0900
> > Takashi Yano via Cygwin-patches <[hidden email]> wrote:
> > > How about the patch attached?
> > > I think this is safer than previous patch.
> >
> > I have revised this patch to fit current git head, and submit
> > to [hidden email].
>
> Thanks, but I didn't apply this one because it doesn't really make sense
> to go to the extra effort to support outdated and incompatible codepages
> we don't handle as Cygwin codeset at all.  IMHO it's not worth to call
> another MBTWC just to check if a codepage supports the MB_ERR_INVALID_CHARS
> flag.

I have checked which codepage does not support MB_ERR_INVALID_CHARS.
The result is as follows.

42
50220
50221
50222
50225
50227
50229
52936
57002
57003
57004
57005
57006
57007
57008
57009
57010
57011
65000

If all of these are not worth for everyone, I agree with you.

--
Takashi Yano <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Cygwin: pty: Add workaround for ISO-2022 and ISCII in convert_mb_str().

Corinna Vinschen-2
On Sep 12 04:11, Takashi Yano via Cygwin-patches wrote:

> On Fri, 11 Sep 2020 20:57:06 +0200
> Corinna Vinschen wrote:
> > On Sep 12 03:37, Takashi Yano via Cygwin-patches wrote:
> > > On Sat, 12 Sep 2020 02:38:43 +0900
> > > Takashi Yano via Cygwin-patches <[hidden email]> wrote:
> > > > How about the patch attached?
> > > > I think this is safer than previous patch.
> > >
> > > I have revised this patch to fit current git head, and submit
> > > to [hidden email].
> >
> > Thanks, but I didn't apply this one because it doesn't really make sense
> > to go to the extra effort to support outdated and incompatible codepages
> > we don't handle as Cygwin codeset at all.  IMHO it's not worth to call
> > another MBTWC just to check if a codepage supports the MB_ERR_INVALID_CHARS
> > flag.
>
> I have checked which codepage does not support MB_ERR_INVALID_CHARS.
> The result is as follows.
>
> 42
> 50220
> 50221
> 50222
> 50225
> 50227
> 50229
> 52936
> 57002
> 57003
> 57004
> 57005
> 57006
> 57007
> 57008
> 57009
> 57010
> 57011
> 65000

Yup, these are documented on MSDN:
https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte

> If all of these are not worth for everyone, I agree with you.

I think we can skip those.


Corinna