Page 1 of 1

Importing a web site´s content into your web site

Posted: Tue Sep 11, 2007 4:16 pm
by PHPeter
Hi, I am building a web site and I just ran into the following problem: in this site I would like to have a "news" section in which I take the headlines from another web site (each headline is a link to the complete story) and post them into mine, preserving the link to the original web site for the complete story.

I have thought a good while about how this can be implemented in any language (php, javascript, etc) but three main problems keep showing up: how to format the site´s content so that it shows in a smaller window without lousy side scrollers, how to "call" the remote website so that its contents appear in my news section, and how to "clean" the web site´s code so that only the links to the news appear, and not any of the images, text not belonging to the headlines and links to the site´s sections. For the latter, I was thinking that perhaps a function which reads the whole code and filters everything that does not begin with "<a href" and ends with "</a>", but this allows other links to show up. For the middle, I have not seen a function which does this. For the former, I have no idea. Oh and it is important that each time the remote site´s headlines are updated, the update is reflected in my site too.

Are there any functions/solutions in php for these problems?

Thanks in advance.

Posted: Tue Sep 11, 2007 4:18 pm
by feyd
I hope you are getting these source sites' permission to use their content in this manner.

Comment

Posted: Tue Sep 11, 2007 5:00 pm
by PHPeter
I am. It is like this: my web site will serve the businesses of the web sites from which I want to import the news. Those businesses are indeed supposed to use part of the content of my web site for their own purposes. Actually, those news will only be shown to users who identify themselves as belonging to the corresponding business. So if user A identifies himself as belonging to business W, A will see W´s headlines on my site, but if he wants to read a whole story, he has to go to W´s site anyways (through the link in the headline which I provide). These headlines are not shown to everyone, only to those who can identify themselves.

Posted: Tue Sep 11, 2007 5:08 pm
by feyd
file_get_contents(), curl, fsockopen() even Snoopy are often used for gathering content from remote sources. preg_match() is used to extract specific content. strip_tags() or libraries similar to it will allow you to strip out the various tags you require.