Importing a web site´s content into your web site

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
PHPeter
Forum Newbie
Posts: 7
Joined: Tue Dec 12, 2006 6:21 pm

Importing a web site´s content into your web site

Post by PHPeter »

Hi, I am building a web site and I just ran into the following problem: in this site I would like to have a "news" section in which I take the headlines from another web site (each headline is a link to the complete story) and post them into mine, preserving the link to the original web site for the complete story.

I have thought a good while about how this can be implemented in any language (php, javascript, etc) but three main problems keep showing up: how to format the site´s content so that it shows in a smaller window without lousy side scrollers, how to "call" the remote website so that its contents appear in my news section, and how to "clean" the web site´s code so that only the links to the news appear, and not any of the images, text not belonging to the headlines and links to the site´s sections. For the latter, I was thinking that perhaps a function which reads the whole code and filters everything that does not begin with "<a href" and ends with "</a>", but this allows other links to show up. For the middle, I have not seen a function which does this. For the former, I have no idea. Oh and it is important that each time the remote site´s headlines are updated, the update is reflected in my site too.

Are there any functions/solutions in php for these problems?

Thanks in advance.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I hope you are getting these source sites' permission to use their content in this manner.
PHPeter
Forum Newbie
Posts: 7
Joined: Tue Dec 12, 2006 6:21 pm

Comment

Post by PHPeter »

I am. It is like this: my web site will serve the businesses of the web sites from which I want to import the news. Those businesses are indeed supposed to use part of the content of my web site for their own purposes. Actually, those news will only be shown to users who identify themselves as belonging to the corresponding business. So if user A identifies himself as belonging to business W, A will see W´s headlines on my site, but if he wants to read a whole story, he has to go to W´s site anyways (through the link in the headline which I provide). These headlines are not shown to everyone, only to those who can identify themselves.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

file_get_contents(), curl, fsockopen() even Snoopy are often used for gathering content from remote sources. preg_match() is used to extract specific content. strip_tags() or libraries similar to it will allow you to strip out the various tags you require.
Post Reply