I'm currently working on a project (well.. starting one) that will index all of the players of a certain MMORPG game. Essentially it will give stats as to how many characters are active/inactive how many adventurers how many artisans (and a bunch of other things that non gamers dont have interest in)
The only way i've found to actually pull this information is from there website which allows you to do a search by server and so forth.
Is it possable to write a PHP script that will run a specified search (or two or three) collect the information and uplaod it into a MySQL database, so that i can parse the information?
I've spent about 2 hours trolling these forums as i'm sure i've seen this done before (about 6 months ago) but cannot find it to save my life.
If some one could even tell me what function or what it is commonly reffered to, so i can have somewhere to search on? Any help on this would be greatly appreciated.
How to pull information off of a webpage [solved] thanks
Moderator: General Moderators
How to pull information off of a webpage [solved] thanks
Last edited by aybra on Wed Jan 11, 2006 6:58 pm, edited 1 time in total.
- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
Your arn't technically allowed to take content from another site unless you have explicit permission from the site owner or own the website.
It is definantly possible, but is usually a bit of work depending on the site, especially if they have borked html/css.
To grab the entire website, it is as easy as
other options include using cURL, fopen(), fsockopen()
Then your going to need to devise some regex to capture the data you want. Unfortunantly, my regex abilities suck, so that's as far as I can help.
It is definantly possible, but is usually a bit of work depending on the site, especially if they have borked html/css.
To grab the entire website, it is as easy as
Code: Select all
$pageContents = file_get_contents('http://somedomain.com/somePage/');Then your going to need to devise some regex to capture the data you want. Unfortunantly, my regex abilities suck, so that's as far as I can help.
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
I couldn't find the old posts I was thinking of, so I'll just rehash a bit:
file_get_contents(), curl, Snoopy, among other stuff can be used in conjunction with regex to pull out the information you seek. Further refinement of the data can be done with things like Tidy.
file_get_contents(), curl, Snoopy, among other stuff can be used in conjunction with regex to pull out the information you seek. Further refinement of the data can be done with things like Tidy.
So even though i'm pulling open data that is accesable by using the site it might still be considered illegal? I cant say that i want to get into a legal debate and might scrub the mission if that is the case...Jcart wrote:Your arn't technically allowed to take content from another site unless you have explicit permission from the site owner or own the website.
It is definantly possible, but is usually a bit of work depending on the site, especially if they have borked html/css.
To grab the entire website, it is as easy as
other options include using cURL, fopen(), fsockopen()Code: Select all
$pageContents = file_get_contents('http://somedomain.com/somePage/');
Then your going to need to devise some regex to capture the data you want. Unfortunantly, my regex abilities suck, so that's as far as I can help.
Thank you both for your replies.. i suppose i'll look into the legalities a bit more before i actually head out to do the deed..