[ale] Grabbing a dynamic website automatically?
Geoffrey
esoteric at 3times25.net
Fri Aug 23 07:18:36 EDT 2002
johncole at mindspring.com wrote:
> Howdy!
>
> Yes, but the problem is that the website changes everyday as I have to log
> into a HTTPS site. Then I have to go through a couple of licks/menus in
Man I hate licks/menus, messes up my monitor screen. Serious
suggestions below...
> order to get the page I need.
> Otherwise, this would work.
>
> I did look over what someone else did for doing Cookie based wgets/curl and
> with HTTPS but I don't see anywhere where it says anything bout time-access
> and logging in and going through a few pages before I get to the content I
> need.
Here's what I've done in the past. When you get to the page that is
just before the one you want to print, check the url that calls that
page. It may be that is all you need to call the page directly. Try
saving this full url somewhere, exit your browser and then attempt to
open this page with a new browser. If it fails because you're missing a
cookie, then the issue is more complex.
You can manipulate cookies with both perl and javascript. The next
attempt would be to retain the cookie they place in your cookie file,
update the time/date and insert it back into your cookie file prior to
attempting to load the page as noted in the previous paragraph.
>
> Thanks for the ideas though everyone!
>
> Thanks,
> John
>
>
>
>>At 08:50 AM 08/22/2002 -0400, you wrote:
>>
>>>Run a cronjob with Links outputting the page to a text file?
>>>
>>>Something like: "links -dump https://www.foo.bar/page.pl > ~/daily" done
>>>at 0200, perhaps?
>>>
>>>--
>>>Christopher R. Curzio | Quantum materiae materietur marmota monax
>>>http://www.accipiter.org | si marmota monax materiam possit materiari?
>>>:wq!
>>>
>>>Thus Spake <johncole at mindspring.com>:
>>>Thu, 22 Aug 2002 08:31:36 -0400
>>>
>>>
>>>
>>>>Howdy all!
>>>>
>>>>What would be the best way to grab the data off of a website that is
>>>>dynamic, HTTPS, and has cookies enabled.? I'm trying to capture a
>>>>single page everyday from a particular website automatically.
>>>>
>>>>(in particular I'm using Redhat 7.2)
>>>>
>>>>I need the page back in text format preferably (or I can convert it to
>>>>text later as needed for insertion into a database.)
>>>>
>>>>Thanks,
>>>>John
>>>
> Paypal membership: free
>
> Donation to Freenet: $20
>
> Never having to answer the question "Daddy, where were you when they took
> freedom of the press away from the Internet?": Priceless.
>
> http://www.freenetproject.org/index.php?page=donations
>
> ---
> This message has been sent through the ALE general discussion list.
> See http://www.ale.org/mailing-lists.shtml for more info. Problems should be
> sent to listmaster at ale dot org.
>
>
--
Until later: Geoffrey esoteric at 3times25.net
I didn't have to buy my radio from a specific company to listen
to FM, why doesn't that apply to the Internet (anymore...)?
---
This message has been sent through the ALE general discussion list.
See http://www.ale.org/mailing-lists.shtml for more info. Problems should be
sent to listmaster at ale dot org.
More information about the Ale
mailing list