Page 1 of 1
Parse content from other pages
Posted: Sun Feb 08, 2009 9:40 am
by cybernike
I am totally new to php. I need a server-side programming language because I need the ability to write files.
Could someone tell me how to parse pages that are not on the server, but rather external?
My situation is I need to parse an external webpage in order to extract some of information and write out the information to a file. I could parse the information with javascript, but I can't write the information to a file without ActiveX.
Re: Parse content from other pages
Posted: Sun Feb 08, 2009 9:52 am
by John Cartwright
file_get_contents() or cURL + preg_match_all()
Probably a bit too dificult for someone that doesn't know PHP. However, post some sample HTML along with the site you want to scrape (assuming you have their permission to do so), then we can help you along furthur.
Re: Parse content from other pages
Posted: Sun Feb 08, 2009 10:02 am
by cybernike
The page is actually a page from a Facebook game (so I can't let you see the page because it requires FB login information). I am not sure if you are familiar with Greasemonkey which can insert Javascript on your own browser (client-side). I wonder if I could do the same thing with php, that is, to display an external page on my own browser(client-side) with my desired javascript inserted. That way, with php, I could write the information I need from the page to a file (whereas Greasemonkey cannot do that).
Re: Parse content from other pages
Posted: Sun Feb 08, 2009 11:27 am
by John Cartwright
Indeed it would be possible using
cURL and some regular expressions.
P.S. I have facebook too, so it still doesn't hurt post a link.
Re: Parse content from other pages
Posted: Sun Feb 08, 2009 2:25 pm
by cybernike
The name is Dragon Wars:
http://apps.facebook.com/dragonwars/
I would like to have two of my own characters in different FB accounts to communicate with each other(for example, to exchange information about their HPs and the numbers of attacks available, etc). I was planning to output the information to a simple html file on my webserver so that it can be parsed by javascript with greasemonkey. Do you have a better suggestion as to how to do this?