Screen scraping - is this possible? Please help!

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
arkdm
Forum Newbie
Posts: 8
Joined: Mon Aug 07, 2006 10:15 am

Screen scraping - is this possible? Please help!

Post by arkdm »

I've been trying (unsucessfully) to scrape a site called Managerzone.com to make a help tool for the site. Many other people have done this but simply don't care to help me out :(. I'm not a very experienced PHP programmer so here goes...

The problem with scraping the site is that whenever you attempt to scrape any of the internal pages (i.e., the pages you can access after you've logged in), it redirects you to the main index page. Does this make sense? What I'd like to do is somehow log in to the website via PHP using my login information and then screen scrape the pages inside. Is this even possible or am I blowing smoke?

Thanks for your help!
blackbeard
Forum Contributor
Posts: 123
Joined: Thu Aug 03, 2006 6:20 pm

Post by blackbeard »

I think you'll need to look at using the cURL functions. I've not used them, so I can't help you out other that pointing you in that direction.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

I hope you have permission :!:

Reguardless, using cURL you can send multiple requests (curl_exec()) in a single page load, for instance: one to login, and one to fetch the content, as long as you use the same curl handle.
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

If the content you are scraping is behind a login screen, doesn't it seem kinda shady to make that content available to users that are not logged in?
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

Everah wrote:If the content you are scraping is behind a login screen, doesn't it seem kinda shady to make that content available to users that are not logged in?
I didn't want to say it.. Been holding it in for hours... But yeah, if they put it behind a login, they probably don't want it to be publicly accessed. Then again, for all I know, this is your account for an online game or something.
arkdm
Forum Newbie
Posts: 8
Joined: Mon Aug 07, 2006 10:15 am

Post by arkdm »

This is my account for the game. I'm simply using my login info to get in and retrieving the info (for my own use only) from there. It's also a free site.

Thanks for pointing me at cURL. Hopefully I'll get it to work. A quick question, when I POST variables do they need to be urlencoded?
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

arkdm wrote:This is my account for the game. I'm simply using my login info to get in and retrieving the info (for my own use only) from there. It's also a free site.
I wrote:Then again, for all I know, this is your account for an online game or something.
I must be psychic :-p
arkdm wrote:Thanks for pointing me at cURL. Hopefully I'll get it to work. A quick question, when I POST variables do they need to be urlencoded?
No. URL encoding is for... URLs. $_GET variables, yes. $_POST, no.
Post Reply