Screen Scraping (getting content from another site)

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
discostu
Forum Newbie
Posts: 4
Joined: Tue Aug 05, 2003 10:50 am

Screen Scraping (getting content from another site)

Post by discostu »

I've been looking around for how to do this in php. There is a site which has a cgi script that prints out an html page with data from a database based on a query string. I want to be able to specify get_content_from($url) which returns a string or something which I can parse to find the specific data "scraped" from that site. I know this can be done in asp.net, but i'm going to run this on a apache linux server, and i'd much rather code in php anyway :D .

Thanks.
discostu
Forum Newbie
Posts: 4
Joined: Tue Aug 05, 2003 10:50 am

Post by discostu »

It seems that all the file functions work for urls also. Because this is a dynamically generated page it has no filesize though, but I can do

Code: Select all

$contents = file_get_contents($filename)
and it works fine.

Now I have to figure out how to get rid of the parts of the html I don't want. I just want everything between the <pre></pre> tags. Is there a function like

Code: Select all

get_part_of_string($contents,"<pre>*</pre>")
Thanks.
Post Reply