How to pull information off of a webpage [solved] thanks

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
aybra
Forum Commoner
Posts: 56
Joined: Sun Nov 24, 2002 12:52 am

How to pull information off of a webpage [solved] thanks

Post by aybra »

I'm currently working on a project (well.. starting one) that will index all of the players of a certain MMORPG game. Essentially it will give stats as to how many characters are active/inactive how many adventurers how many artisans (and a bunch of other things that non gamers dont have interest in)

The only way i've found to actually pull this information is from there website which allows you to do a search by server and so forth.

Is it possable to write a PHP script that will run a specified search (or two or three) collect the information and uplaod it into a MySQL database, so that i can parse the information?

I've spent about 2 hours trolling these forums as i'm sure i've seen this done before (about 6 months ago) but cannot find it to save my life.
If some one could even tell me what function or what it is commonly reffered to, so i can have somewhere to search on? Any help on this would be greatly appreciated.
Last edited by aybra on Wed Jan 11, 2006 6:58 pm, edited 1 time in total.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Your arn't technically allowed to take content from another site unless you have explicit permission from the site owner or own the website.

It is definantly possible, but is usually a bit of work depending on the site, especially if they have borked html/css.
To grab the entire website, it is as easy as

Code: Select all

$pageContents = file_get_contents('http://somedomain.com/somePage/');
other options include using cURL, fopen(), fsockopen()

Then your going to need to devise some regex to capture the data you want. Unfortunantly, my regex abilities suck, so that's as far as I can help. :)
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I couldn't find the old posts I was thinking of, so I'll just rehash a bit:

file_get_contents(), curl, Snoopy, among other stuff can be used in conjunction with regex to pull out the information you seek. Further refinement of the data can be done with things like Tidy.
aybra
Forum Commoner
Posts: 56
Joined: Sun Nov 24, 2002 12:52 am

Post by aybra »

Jcart wrote:Your arn't technically allowed to take content from another site unless you have explicit permission from the site owner or own the website.

It is definantly possible, but is usually a bit of work depending on the site, especially if they have borked html/css.
To grab the entire website, it is as easy as

Code: Select all

$pageContents = file_get_contents('http://somedomain.com/somePage/');
other options include using cURL, fopen(), fsockopen()

Then your going to need to devise some regex to capture the data you want. Unfortunantly, my regex abilities suck, so that's as far as I can help. :)
So even though i'm pulling open data that is accesable by using the site it might still be considered illegal? I cant say that i want to get into a legal debate and might scrub the mission if that is the case...

Thank you both for your replies.. i suppose i'll look into the legalities a bit more before i actually head out to do the deed..
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

best route, ask them for permission. If they say yes, then hoorah. If no, then you have to figure out a different route. Read through their terms of use/terms of service/privacy policy/legal junk to see what they say as a blanket right now. If they say nothing, then you should definitely ask.
Post Reply