[ale] sed regexp question

Wandered Inn esoteric at denali.atlnet.com
Tue Jul 10 18:47:36 EDT 2001


Christopher Bergeron wrote:
> 
> That would only get websites that start with www;  I can't predict all the
> possible names that might arise.  i do know that the url is always encoded
> in a page as:
> 
> <A HREF="http://xxx.pornsite.com/pictures1.html/">

# assumes one url per line

grep -i 'href=' |awk -F'"' '{print $2}'

If you know the 'HREF' will be all caps, you can do it faster with:

awk -F '"' '/HREF=/ {print $2}'

> 
> so, all I need to do is take everything between the "http:// and the ">
> 
> any suggestions?
> 
> would SED or GREP be better suited for this, and even better, what is the
> way to do it?!
> 
> thanks again for all the leads...
> 
> Christopher Bergeron
> Systems Administrator
> Full Line Distributors
> (770) 416-4237
> mis at fullline.com
> 
> > -----Original Message-----
> > From: I. Herman [mailto:izzmo at mediaone.net]
> > Sent: Tuesday, July 10, 2001 1:41 PM
> > To: Christopher Bergeron
> > Subject: Re: [ale] sed regexp question
> >
> >
> > what's the html file?  You can try:
> >
> > cat whatever.html | grep http | grep www
> >
> > or something like that...not sure what you are trying to do...i'm not
> > familiar w/ sed
> >
> >
> >
> 
> --
> To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.

--
Until later: Geoffrey		esoteric at denali.atlnet.com

"Great spirits have always found violent opposition from mediocre minds.
The latter cannot understand it when a man does not thoughtlessly submit
to hereditary prejudices but honestly and courageously uses his
intelligence." - Albert Einstein
--
To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.





More information about the Ale mailing list