[ale] (no subject) SPAM talk...
Fletch
fletch at phydeaux.org
Sat Apr 17 11:36:49 EDT 2004
>>>>> "ChangingLINKS" == ChangingLINKS com <ChangingLINKS.com> writes:
[...]
ChangingLINKS> Can spammers figure out how to harvest the email
ChangingLINKS> from that? Sure.
ChangingLINKS> Will they harvest it? I strongly doubt it. In
ChangingLINKS> short: Spammers are programmers, have access to
ChangingLINKS> programmers (or buy ready made spamming
ChangingLINKS> software). And writing such code would take *time*
ChangingLINKS> which increases cost. Aside from the virus/
It took me all of about 5 minutes (start ~1109, working ~1116) to get
this which will work for one page; most of that time was remembering
how to use HTML::TokeParser (and a quick trip to the facilities :).
#!/usr/bin/perl
use LWP::Simple qw( get );
use HTML::TokeParser ();
my $url = shift;
my $content = get( $url )
or die "Can't fetch $url\n";
my $stream = HTML::TokeParser->new( \$content )
or die "Can't create TokeParser: $!\n";
while( my $t = $stream->get_token ) {
if( $t->[0] eq 'S' and $t->[1] eq 'a'
and $t->[2]->{href} =~ /^mailto:/ ) {
my $addr = $stream->get_trimmed_text( "/a" );
$addr =~ s/\s+at\s+/\@/;
print $addr, "\n";
last;
}
}
exit 0;
__END__
In under an hour I could have this spidering in parallel (in about
four to five hours I could have something which would spread requests
out to come from any number of endpoints to make it not look like
spidering). And all with just off the shelf components. I probably
could do the same in Ruby, again with pretty much off the shelf
components, in about the same time (well, probably about 2 hours since
I'm still working on my Ruby-fu). Serious harvesters will have
someone who could do the same in the same timeframes on retainer (not
to mention probably already having much of the spidering
infrastructure already in place).
--
Fletch | "If you find my answers frightening, __`'/|
fletch at phydeaux.org | Vincent, you should cease askin' \ o.O'
| scary questions." -- Jules =(___)=
| U
More information about the Ale
mailing list