[ale] More perl than linux related question

James CE Johnson jcej at tragus.org
Mon Jun 18 17:42:30 EDT 2001


Mel Burslan wrote:

>
>Well, I should have used "web application programming" instead of perl
>but I think this does suffice as well.
>
>My question is :
>
>where can I find a perl (or any other reasonably easy to implement web
>programming language) script which can send an http://www.blah-blah.com
>query to a site and store the incoming response page to a file for
>further processing ? Once upon a time I asked the same question on a
>perl message board and someone sent me a link to a perl library site but
>I was not able to figure out the answer.
>
The answer to your specific question is taken care of in three lines:

    local $ua = LWP::UserAgent->new;
    my $request = HTTP::Request->new('GET',"somePage");
    my $response = $ua->request($request,"localCopyOfSomePage");

Below is a quick script that you I use to suck images off of a 
web-gallery.  Given a list of URLs it will attempt to save any images by 
searching for <a href="http://something/foo.(gif|jpg)">

YMMV

#!/usr/bin/perl

use File::Basename;
use File::Path;
use LWP::UserAgent;
use HTML::Parser;
use HTTP::Request;
use HTTP::Response;

select(STDOUT); $| = 1;

local $ua = LWP::UserAgent->new;

for my $page (@ARGV) {
  (local $dir = $page) =~ s,/[^/]+$,,;
  print STDERR "$page\n";

  # Fetch $page and store into the file 'tmp'
  my $request = HTTP::Request->new('GET',$page);
  my $response = $ua->request($request,'tmp');
 
  my $parser = HTML::Parser->new(
                                 api_version => 3,
                                 start_h => [\&start, "tagname, attr"],
                                );
  $parser->parse_file('tmp');
 
  print "\n";
}
unlink('tmp');

sub start
  {
    my $tag = shift;
    my $attr = shift;
   
    return if( $tag ne 'a' );
   
    my $href = $attr->{href};
   
   #  print ".";
   
    return if( $href !~ m/jpg$/i && $href !~ m/gif$/i );
   
    return if( -s $href );
   
    my $p = $href =~ m,^http://, ? $href : "$dir/$href";
    my $f = basename($p);
    return if( -f $f );
   
    print STDERR "  $p\n";
    my $request = HTTP::Request->new('GET',$p);
    my $response = $ua->request($request,"$f");
    sleep(1);
  }


--
To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.





More information about the Ale mailing list