Page 1 of 1

Screen scraping - is this possible? Please help!

Posted: Wed May 30, 2007 9:52 am
by arkdm
I've been trying (unsucessfully) to scrape a site called Managerzone.com to make a help tool for the site. Many other people have done this but simply don't care to help me out :(. I'm not a very experienced PHP programmer so here goes...

The problem with scraping the site is that whenever you attempt to scrape any of the internal pages (i.e., the pages you can access after you've logged in), it redirects you to the main index page. Does this make sense? What I'd like to do is somehow log in to the website via PHP using my login information and then screen scrape the pages inside. Is this even possible or am I blowing smoke?

Thanks for your help!

Posted: Wed May 30, 2007 10:06 am
by blackbeard
I think you'll need to look at using the cURL functions. I've not used them, so I can't help you out other that pointing you in that direction.

Posted: Wed May 30, 2007 12:57 pm
by John Cartwright
I hope you have permission :!:

Reguardless, using cURL you can send multiple requests (curl_exec()) in a single page load, for instance: one to login, and one to fetch the content, as long as you use the same curl handle.

Posted: Wed May 30, 2007 2:36 pm
by RobertGonzalez
If the content you are scraping is behind a login screen, doesn't it seem kinda shady to make that content available to users that are not logged in?

Posted: Wed May 30, 2007 2:46 pm
by superdezign
Everah wrote:If the content you are scraping is behind a login screen, doesn't it seem kinda shady to make that content available to users that are not logged in?
I didn't want to say it.. Been holding it in for hours... But yeah, if they put it behind a login, they probably don't want it to be publicly accessed. Then again, for all I know, this is your account for an online game or something.

Posted: Wed May 30, 2007 3:42 pm
by arkdm
This is my account for the game. I'm simply using my login info to get in and retrieving the info (for my own use only) from there. It's also a free site.

Thanks for pointing me at cURL. Hopefully I'll get it to work. A quick question, when I POST variables do they need to be urlencoded?

Posted: Wed May 30, 2007 3:50 pm
by superdezign
arkdm wrote:This is my account for the game. I'm simply using my login info to get in and retrieving the info (for my own use only) from there. It's also a free site.
I wrote:Then again, for all I know, this is your account for an online game or something.
I must be psychic :-p
arkdm wrote:Thanks for pointing me at cURL. Hopefully I'll get it to work. A quick question, when I POST variables do they need to be urlencoded?
No. URL encoding is for... URLs. $_GET variables, yes. $_POST, no.