[ale] Single out a alarm with regex

Alex LeDonne aledonne.listmail at gmail.com
Thu Oct 5 10:41:55 EDT 2006


On 10/5/06, Christopher Fowler <cfowler at outpostsentinel.com> wrote:
> On Thu, 2006-10-05 at 09:58 -0400, Alex LeDonne wrote:
> > Can anything come after the type? Or, more particularly, can anything
> > come after the type if the type is "MAINT INDICATION" ?
>
> Yes.  It can be a timestamp or more.

Rats. Then we can't anchor it at the end of the regex.

> > Finally, is
> > there guaranteed to be at least one whitespace character preceding the
> > type?
>
> Yes
>
> > This is critical if you want the zero-width assertion to work...
> > if you have .* immediately before a zero-width negative lookahead,
> > when the negative assertion blocks the match, the engine will
> > backtrack and .* will consume the next character, the negative no
> > longer blocks, and the pattern matches.
>
> I'm sorry I showed Perl code and everyone is assuming I'm writing the
> code in perl.  I'm only using perl to test the regex.  This regex will
> go in a list of 100 other regexs and a C program that is using the gnu
> regex library will be doing the searches.  It is not possible to grab
> anything than later look for 'MAINT INDICATION' like this:
>
>         if (/^DCH:\s+\d+\s+(\w+).*$/) {
>                 if ($1 ne 'MAINT') {
>                         foo;
>                 }
>         }
>
> Everything has to be done inside of the regex.


OK... while I haven't tried to prove it yet, my gut says this is not a
set of constraints that is matchable in the algebra of gnu regex.

Part of the problem is the fact that the boundary character before the
zero-width assertion (a space) appears in the string that you want to
negatively assert against. So the "backtracking & consuming to match"
problem is going to be, I think, insurmountable, unless we can nail
down more specifically what goes between DCH: and the <type>. In other
words, without further constraints I think the data is too
unstructured to do the kind of match and exclusion you're aiming for
in a single regex.

But I'd love to be proven wrong!

-A



More information about the Ale mailing list