[ale] sed regexp question

Joseph A. Knapka jknapka at earthlink.net
Wed Jul 11 12:23:22 EDT 2001


Ken Nagorski wrote:
> 
> And let the holy wars begin.....
> 
> :)
> 
> On Tue, 10 Jul 2001, Wandered Inn wrote:
> 
> > Fletch wrote:
> > >
> > >         Fanning the flames of the language holy war . . . :)
> > >
> > > perl -lne 'print$1while/href="?([^">]+?)"?>/gi' *.html
> >
> > I feel much better now as I'm sure I won't have to explain my code.
> > Sweet.
> >

Here's an entry in the true "pearl" of scripting languages:

#!/usr/bin/python
import re
import sys
reobj = re.compile(r"""[Hh][Rr][Ee][Ff]="*(http:[^">]*)[">]""")
fname = sys.argv[1]
infile = open(fname,"r")
text = infile.read()
idx = 0
matchobj = reobj.search(text,idx)
while matchobj != None:
  print(matchobj.group(1))
  idx = matchobj.end()
  matchobj = reobj.search(text,idx)

A bit more verbose, but gets all the URLs on a line, and
I predict I'll be able to understand it this time next
year with minimal effort :-) Granted, Python is a bit
heavyweight for one-offs like this, but it beats the
snot out of Perl or Tcl once you reach a couple hundred
lines of code.

-- Joe Knapka
"You know how many remote castles there are along the gorges? You
 can't MOVE for remote castles!" -- Lu Tze re. Uberwald
// Linux MM Documentation in progress:
// http://home.earthlink.net/~jknapka/linux-mm/vmoutline.html
* Evolution is an "unproven theory" in the same sense that gravity is. *
--
To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.





More information about the Ale mailing list