[ale] sed regexp question
Wandered Inn
esoteric at denali.atlnet.com
Tue Jul 10 18:47:36 EDT 2001
Christopher Bergeron wrote:
>
> That would only get websites that start with www; I can't predict all the
> possible names that might arise. i do know that the url is always encoded
> in a page as:
>
> <A HREF="http://xxx.pornsite.com/pictures1.html/">
# assumes one url per line
grep -i 'href=' |awk -F'"' '{print $2}'
If you know the 'HREF' will be all caps, you can do it faster with:
awk -F '"' '/HREF=/ {print $2}'
>
> so, all I need to do is take everything between the "http:// and the ">
>
> any suggestions?
>
> would SED or GREP be better suited for this, and even better, what is the
> way to do it?!
>
> thanks again for all the leads...
>
> Christopher Bergeron
> Systems Administrator
> Full Line Distributors
> (770) 416-4237
> mis at fullline.com
>
> > -----Original Message-----
> > From: I. Herman [mailto:izzmo at mediaone.net]
> > Sent: Tuesday, July 10, 2001 1:41 PM
> > To: Christopher Bergeron
> > Subject: Re: [ale] sed regexp question
> >
> >
> > what's the html file? You can try:
> >
> > cat whatever.html | grep http | grep www
> >
> > or something like that...not sure what you are trying to do...i'm not
> > familiar w/ sed
> >
> >
> >
>
> --
> To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.
--
Until later: Geoffrey esoteric at denali.atlnet.com
"Great spirits have always found violent opposition from mediocre minds.
The latter cannot understand it when a man does not thoughtlessly submit
to hereditary prejudices but honestly and courageously uses his
intelligence." - Albert Einstein
--
To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.
More information about the Ale
mailing list