Page 1 of 1

How to pull information off of a webpage [solved] thanks

Posted: Wed Jan 11, 2006 6:33 pm
by aybra
I'm currently working on a project (well.. starting one) that will index all of the players of a certain MMORPG game. Essentially it will give stats as to how many characters are active/inactive how many adventurers how many artisans (and a bunch of other things that non gamers dont have interest in)

The only way i've found to actually pull this information is from there website which allows you to do a search by server and so forth.

Is it possable to write a PHP script that will run a specified search (or two or three) collect the information and uplaod it into a MySQL database, so that i can parse the information?

I've spent about 2 hours trolling these forums as i'm sure i've seen this done before (about 6 months ago) but cannot find it to save my life.
If some one could even tell me what function or what it is commonly reffered to, so i can have somewhere to search on? Any help on this would be greatly appreciated.

Posted: Wed Jan 11, 2006 6:43 pm
by John Cartwright
Your arn't technically allowed to take content from another site unless you have explicit permission from the site owner or own the website.

It is definantly possible, but is usually a bit of work depending on the site, especially if they have borked html/css.
To grab the entire website, it is as easy as

Code: Select all

$pageContents = file_get_contents('http://somedomain.com/somePage/');
other options include using cURL, fopen(), fsockopen()

Then your going to need to devise some regex to capture the data you want. Unfortunantly, my regex abilities suck, so that's as far as I can help. :)

Posted: Wed Jan 11, 2006 6:46 pm
by feyd
I couldn't find the old posts I was thinking of, so I'll just rehash a bit:

file_get_contents(), curl, Snoopy, among other stuff can be used in conjunction with regex to pull out the information you seek. Further refinement of the data can be done with things like Tidy.

Posted: Wed Jan 11, 2006 6:49 pm
by aybra
Jcart wrote:Your arn't technically allowed to take content from another site unless you have explicit permission from the site owner or own the website.

It is definantly possible, but is usually a bit of work depending on the site, especially if they have borked html/css.
To grab the entire website, it is as easy as

Code: Select all

$pageContents = file_get_contents('http://somedomain.com/somePage/');
other options include using cURL, fopen(), fsockopen()

Then your going to need to devise some regex to capture the data you want. Unfortunantly, my regex abilities suck, so that's as far as I can help. :)
So even though i'm pulling open data that is accesable by using the site it might still be considered illegal? I cant say that i want to get into a legal debate and might scrub the mission if that is the case...

Thank you both for your replies.. i suppose i'll look into the legalities a bit more before i actually head out to do the deed..

Posted: Wed Jan 11, 2006 9:56 pm
by feyd
best route, ask them for permission. If they say yes, then hoorah. If no, then you have to figure out a different route. Read through their terms of use/terms of service/privacy policy/legal junk to see what they say as a blanket right now. If they say nothing, then you should definitely ask.