change in pattern matching in pager /usr/bin/less

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

change in pattern matching in pager /usr/bin/less

L A Walsh

For some reason, the behavior of less has changed recently in regards to how
it interprets characters like '\s' (whitespace).

Unlike previous versions which worked to use '\s' for whitespace and
use '+' for '1 or more', there seems to be nothing for \s
and to use '+' you would need <CHAR><CHAR>*.

This puts the cygwin 'less' at variance with the version I'm using on
linux.

part of this is that the new cygwin less appears to use Obsolete REs
that don't support '+'.  That may be a compile flag.

I don't know why \s is not working, however, 'awk' used to be the definitive
Extended (modern) RE reference and does use \s for whitespace.

My  linux less is at version 458 (POSIX RegEx)
my cygwin less is at version 530 (POSIX RegEX)

It could be the libraries they are linking into for RE's.
Though both appear to use the Gnu regex stuff.

Note, I'm not just saying cygwin less is different from linux less,
but also that the cygwin less is different from how it used to be.

Regularly when looking at a manpage, I use the search string:

^\s+topic\s

to find header words or cmds, etc.  I.e. it starts on a line by itself and
has multiple whitespace characters before it and 1 after.  I may need to
repeat the search a few times, as the word searched for can also be 1st on
a line.

Like to find 'typeset', I'd use '^\s+typeset\s'.

That no longer works in cygwin.

Idea?  Fix please?







--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: change in pattern matching in pager /usr/bin/less

Wayne Davison
On Sun, Sep 1, 2019 at 5:50 PM L A Walsh wrote:
> For some reason, the behavior of less has changed recently in regards to how
> it interprets characters like '\s' (whitespace).

Sadly, it's been compiled with POSIX regular expressions on Cygwin for
quite a while now. On Linux it is often compiled with GNU regex or
PCRE.

I started compiling my own version of less on Cygwin back in January
for this very reason. If you snag the source and run "./configure
--with-regex=pcre ; make" then you'll get a version of less.exe that
you can put somewhere early on your path (or in place of the stock
less.exe). You'll need the standard build tools installed (one way to
get those is to install cygport) plus some extra devel libraries, such
as libpcre-devel and libncurses-devel.

It would certainly be nice to have the PCRE regex as the standard in
the Cygwin version, though.

Running the Cygwin version:

$ /usr/bin/less --version
less 530 (POSIX regular expressions)
Copyright (C) 1984-2017  Mark Nudelman

Running my version:

$ less --version
less 530 (PCRE regular expressions)
Copyright (C) 1984-2017  Mark Nudelman

..wayne..

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: change in pattern matching in pager /usr/bin/less

Steven Penny
In reply to this post by L A Walsh
On Sun, 01 Sep 2019 17:50:17, L A Walsh wrote:
> part of this is that the new cygwin less appears to use Obsolete REs
> that don't support '+'.  That may be a compile flag.
>
> I don't know why \s is not working, however, 'awk' used to be the definitive
> Extended (modern) RE reference and does use \s for whitespace.
>
> My  linux less is at version 458 (POSIX RegEx)
> my cygwin less is at version 530 (POSIX RegEX)

I have same version as you:

    $ less --version
    less 530 (POSIX regular expressions)

"+" is defined with PCRE, but also under ERE, which is what is used by "awk" and
"grep -E". Its only BRE that doesnt define "+". Not sure why this isnt working
for you but my less uses ERE. Another syntax under ERE is "{1,}", but that wont
help you if your ERE is missing.

In regard to "\s", thats not defined by BRE or ERE, so you would need PCRE for
that. Workaround is to use "[[:blank:]]", "[[:space]]", or simply use the space
character. I havent seen many man pages using tabs. For example doing "man man"
and searching for "[[:cntrl:]]" returns nothing.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Reply | Threaded
Open this post in threaded view
|

Re: change in pattern matching in pager /usr/bin/less

Andrey Repin
In reply to this post by L A Walsh
Greetings, L A Walsh!

> For some reason, the behavior of less has changed recently in regards to how
> it interprets characters like '\s' (whitespace).

Not recently.
Been that way for a long while. I've questioned this behavior on at least two
occasions, and the last time we've found out that Cygwin Less is compiled
without PCRE support.


--
With best regards,
Andrey Repin
Tuesday, September 3, 2019 9:50:06

Sorry for my terrible english...


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple