[ale] IO to a web site.

Jason Day jasonday at worldnet.att.net
Mon Nov 15 14:18:18 EST 2004


On Sun, Nov 14, 2004 at 09:04:47AM -0500, Christopher Fowler wrote:
> I'm trying to get a page, www.google.com/, using this simple perl
> script.  I expected this to be more line by line basis but google is not
> terminated the connection after the page is spit out.  Also I do not
> seem to get the whole page and am stuck in the while loop.

You're using HTTP 1.1, which will use persistent connections, so the
connection won't close after the first get.  Also, the server will
typically use a chunked content type, which is more fun to parse.  I
*highly* recommend using LWP, but if you really want to roll your own
you'll need to read the HTTP 1.1 RFC in depth:
http://www.w3.org/Protocols/rfc2616/rfc2616.html.

Another option is to use HTTP 1.0 instead.  It is *much* easier to
parse, but I wouldn't recommend using.  HTTP 1.1 is required for named
virtual hosts, for one thing, so you might not even be able to use HTTP
1.0.  And even if you can, the server you're using might suddenly stop
supporting 1.0 one day (slashdot did this not too long ago; broke a
script I had written).

HTH,
Jason
-- 
Jason Day                                       jasonday at
http://jasonday.home.att.net                    worldnet dot att dot net
 
"Of course I'm paranoid, everyone is trying to kill me."
    -- Weyoun-6, Star Trek: Deep Space 9



More information about the Ale mailing list