[ale] sed regexp question
Joseph A. Knapka
jknapka at earthlink.net
Wed Jul 11 12:23:22 EDT 2001
Ken Nagorski wrote:
>
> And let the holy wars begin.....
>
> :)
>
> On Tue, 10 Jul 2001, Wandered Inn wrote:
>
> > Fletch wrote:
> > >
> > > Fanning the flames of the language holy war . . . :)
> > >
> > > perl -lne 'print$1while/href="?([^">]+?)"?>/gi' *.html
> >
> > I feel much better now as I'm sure I won't have to explain my code.
> > Sweet.
> >
Here's an entry in the true "pearl" of scripting languages:
#!/usr/bin/python
import re
import sys
reobj = re.compile(r"""[Hh][Rr][Ee][Ff]="*(http:[^">]*)[">]""")
fname = sys.argv[1]
infile = open(fname,"r")
text = infile.read()
idx = 0
matchobj = reobj.search(text,idx)
while matchobj != None:
print(matchobj.group(1))
idx = matchobj.end()
matchobj = reobj.search(text,idx)
A bit more verbose, but gets all the URLs on a line, and
I predict I'll be able to understand it this time next
year with minimal effort :-) Granted, Python is a bit
heavyweight for one-offs like this, but it beats the
snot out of Perl or Tcl once you reach a couple hundred
lines of code.
-- Joe Knapka
"You know how many remote castles there are along the gorges? You
can't MOVE for remote castles!" -- Lu Tze re. Uberwald
// Linux MM Documentation in progress:
// http://home.earthlink.net/~jknapka/linux-mm/vmoutline.html
* Evolution is an "unproven theory" in the same sense that gravity is. *
--
To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.
More information about the Ale
mailing list