[ale] re: raking

Cade Thacker linux at cade.org
Sun Aug 31 02:35:08 EDT 2003


Thus spake Geoffrey
> You know that wget obeys robot.txt, thus if the site doesn't want you
> picking those files up, wget will not pull them.

This is true, but you can also add "-e robots=off" and it will ignore the
robots file...

In looking/greping the html they are the only PDFs on the page, so a quick
perl script that wgets the
http://www.georgiaoutdoors.com/hunting/WMAmaps.asp, a quick grep for pdf,
perl parsing, and another wget loop and you get all your files at the
enter of a key, if you wanted to do something really fancy, you could
probably figure out how to do just a HEAD on each file and see if the
size(maybe even the last modified time) of the files are been
update.  If you are not perl savy then let me know and I can probably put
something together quickly if you want...

--cade

On Linux vs Windows
==================
Remember, amateurs built the Ark, Professionals built the Titanic!
==================




_______________________________________________
Ale mailing list
Ale at ale.org
http://www.ale.org/mailman/listinfo/ale





More information about the Ale mailing list