Isit really "junk" which seems hard to define (as more or less any list of
characters is a legal fragment of regex, matching itself.) Is
[a-z]+JUNK the regex [a-z]+ followed by JUNK or the regex [a-z]+JUNK ?
Or do you just want to strip a trailing <KEYWORDS HERE> which is easier
but I wouldn't describe it as JUNK if it's matching a specific angle
bracket syntax.
On Mon, 16 Dec 2024 at 13:43, Norm Tovey-Walsh ndw@xxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> > I wrote a recursive function to do this. See below. Is there is a
> simpler way to do it?
>
> If the regex is always in parens, and if the junk that follows never
> contains a b)b, then just look for the last b)b.
>
> If the regex is always in parens, but the junk might include b(b and/or
> b)b then itbs going to be harder.
>
> If the regex isnbt always in parens, b& Ibm not sure the problem is
> tractable. A string of the form babcdb could be interpreted several
ways
> depending on whether bbcdb, bcdb, bdb, or bb is considered
junk.
>
> On a quick skim, I wasnbt able to persuade myself that your recursive
> solution was handling escaped parens, if thatbs an issue
>
> Assuming the regex is always in parens, I cooked up this ixml grammar in a
> moment or two, but it doesnbt handle escaped parens either.
>
> text = regex, junk? .
> regex = '(', inner*, ')' .
> -inner = -regex | ~["()"] .
> junk = ~[]* .
>
> Be seeing you,
> norm
>
> --
> Norm Tovey-Walsh <ndw@xxxxxxxxxx>
> https://norm.tovey-walsh.com/
>
> > Weeks of programming can save you hours of planning.
|