[ale] Extraction of address and pages

Thu Nov 4 18:15:58 EST 2004

A long time ago, (04.11.04), in a galaxy far, far away, Christopher Fowler...:

:=I'm trying to get http://addr:port/page 
:=
:=from:
:=
:=GET http://www.google.com/ HTTP/1.1
:=
:=this sucks as it is too greedy.  Anyone have a suggestion.
:=$m =~ m/http:\/\/(.+)\/\s+/;

docx> cat foo.pl
#!/usr/bin/perl -w

$url[0] = 'http://www.google.com:80/gmail HTTP/1.1';
$url[1] = 'http://www.google.com/gmail HTTP/1.1';
$url[2] = 'http://www.google.com/ HTTP/1.1';
$url[3] = 'http://www.google.com/';
$url[4] = 'http://www.google.com:80/';

foreach $url (@url) {
        $host_port = ''; $page = ''; $protocol = '';
        ($host_port, $page, $protocol) = $url =~ m#http://(.*?)/([^\s]*)\s*(.*)#;
        $host = $host_port; $port = '';
        ($host, $port) = split /:/, $host_port if $host_port =~ /:/;

        print "host: $host\nport: $port\npage: $page\nprotocol: $protocol\n--\n";
}
docx> ./foo.pl
host: www.google.com
port: 80
page: gmail
protocol: HTTP/1.1
--
host: www.google.com
port:
page: gmail
protocol: HTTP/1.1
--
host: www.google.com
port:
page:
protocol: HTTP/1.1
--
host: www.google.com
port:
page:
protocol:
--
host: www.google.com
port: 80
page:
protocol:
--

-- 
Dylan Northrup - docx at io.com - http://www.io.com/~docx/
"Harder to work, harder to strive, hard to be glad to be alive, but it's 
 really worth it if you give it a try." -- Cowboy Mouth, 'Easy'