[ale] Regex question

Fri Mar 12 02:40:08 EST 2004

Chris Fowler <cfowler at outpostsentinel.com> writes:

> On Thu, 2004-03-11 at 23:34, Mike Murphy wrote:
> > ah, yeah, stuff like awk won't know what \d is. Try this:
> > 
> > echo XMI313 | awk '/XMI[0-9][0-9][0-9]/ {print $0}'
> > 
> I think this is the first one that would be required.
> 
> > this is a little neater:
> > 
> > echo XMI333 | awk '/XMI[0-9]+/ {print $0}'
> > 
> 
> If I did XMI0 then the string would hit.  The remaining 01 would be
> ignored by the program. 

No. REs are (almost always) greedy by default. However, it would match
the lone string "XMI0", as well as the string "XMI0123", which you
apparently don't want. To get exactly what you want you can use

(XMI([0-9]{3})?)

Which says, "XMI, possibly followed by exactly three occurrences of a
decimal digit." Or if your regexp engine doesn't understand {}, just
explicitly repeat the [0-9] three times:

(XMI([0-9][0-9][0-9])?)

The above RE should work everywhere REs are spoken. Both of these
have the effect of creating an extraneous capture group, which might
cause problems in some contexts. Perl-compatible REs have a syntax for
a non-capturing group, but I don't remember it offhand. It's something
like

(XMI(:?[0-9]{3})?)

I think.

-- Joe

 That would be an unanticipated consequence.
> So an error code is basically a few letters followed by numbers.  The
> letters represent a group and the numbers represent a specific error
> condition.  I'm trying to group them to reduce load on the program.  I
> have to search for 1000 possible errors.
> 
> 
> > but could have unanticipated consequences. For instance, if you do that 
> > with an input of something like 'XMI3334', its going to find that, but 
> > that's also true of the first example. (because the substring matched. 
> > That's probably ok for your purposes. If not, you might try anchoring 
> > that with a ^ and a $ if necessary (assuming that would work for your 
> > stream).
> > 
> > Mike
> > 
> > 
> > Christopher Fowler wrote:
> > > On Thu, Mar 11, 2004 at 11:13:04PM -0500, Mike Murphy wrote:
> > > 
> > >>unless I'm missing something, something like this:
> > >>
> > >>=~ /(XMI\d\d\d)/ should work. The entire string matched will show up in 
> > >>$1 afteward. This presumes that not other characters will show in the 
> > >>string.
> > > 
> > > 
> > > I must be doing something wrong then,  I'm using AWK to validate the regex.
> > > The perl Expect module will actually do the matching based on the regular
> > > expression so I do not think anything that is perl specific will work.  That
> > > is why I'm testing with awk
> > > 
> > > echo XMI | awk '/XMI\d\d\d/ {print $0}' 
> > > 
> 
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://www.ale.org/mailman/listinfo/ale
> 
> 

-- 
Barney comes to play with us whenever we may need him;
Someday we will hunt him down and chop him up and eat him!
   -- Annze, age 7
--
If you really want to get my attention, send mail to
jknapka .at. kneuro .dot. net.