[ale] (no subject) SPAM talk...

Fletch fletch at phydeaux.org
Sat Apr 17 11:36:49 EDT 2004


>>>>> "ChangingLINKS" == ChangingLINKS com <ChangingLINKS.com> writes:

[...]


    ChangingLINKS> Can spammers figure out how to harvest the email
    ChangingLINKS> from that? Sure.

    ChangingLINKS> Will they harvest it? I strongly doubt it. In
    ChangingLINKS> short: Spammers are programmers, have access to
    ChangingLINKS> programmers (or buy ready made spamming
    ChangingLINKS> software). And writing such code would take *time*
    ChangingLINKS> which increases cost. Aside from the virus/

It took me all of about 5 minutes (start ~1109, working ~1116) to get
this which will work for one page; most of that time was remembering
how to use HTML::TokeParser (and a quick trip to the facilities :).


#!/usr/bin/perl

use LWP::Simple qw( get );
use HTML::TokeParser ();

my $url = shift;

my $content = get( $url )
  or die "Can't fetch $url\n";

my $stream = HTML::TokeParser->new( \$content )
  or die "Can't create TokeParser: $!\n";

while( my $t = $stream->get_token ) {
  if( $t->[0] eq 'S' and $t->[1] eq 'a'
      and $t->[2]->{href} =~ /^mailto:/ ) {
    my $addr = $stream->get_trimmed_text( "/a" );
    $addr =~ s/\s+at\s+/\@/;
    print $addr, "\n";
    last;
  }
}

exit 0;

__END__


In under an hour I could have this spidering in parallel (in about
four to five hours I could have something which would spread requests
out to come from any number of endpoints to make it not look like
spidering).  And all with just off the shelf components.  I probably
could do the same in Ruby, again with pretty much off the shelf
components, in about the same time (well, probably about 2 hours since
I'm still working on my Ruby-fu).  Serious harvesters will have
someone who could do the same in the same timeframes on retainer (not
to mention probably already having much of the spidering
infrastructure already in place).

-- 
Fletch                | "If you find my answers frightening,       __`'/|
fletch at phydeaux.org   |  Vincent, you should cease askin'          \ o.O'
                      |  scary questions." -- Jules                =(___)=
                      |                                               U



More information about the Ale mailing list