News harvesting

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
Skittlewidth
Forum Contributor
Posts: 389
Joined: Wed Nov 06, 2002 9:18 am
Location: Kent, UK

News harvesting

Post by Skittlewidth »

I've just subscribed to the Guardian Headline Newsfeed service http://www.guardian.co.uk/headlineservi ... 35,00.html but am now having problems actually setting it up on the webpage it needs to go on. They recommend you use VBscript or Perl to harvest the news off the personalised url they send you, but i have no experience in either of these. How would I go about doing it in PHP, preserving the urls so that the users can still jump to the full article?

Anyone else used this service? :?:
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

Hmm...I am no expert on mail-servers etc. :(

However, if it's news you're after, have a look at

http://www.moreover.com

They used to have a free section, don't know if it still exists. I think they even offered scripts to pull the links from their website (but it wasn't emailed, hence it's different to your problem).

The mother of all news-indexes is:

http://searchenginewatch.com/links/news.html
User avatar
Skittlewidth
Forum Contributor
Posts: 389
Joined: Wed Nov 06, 2002 9:18 am
Location: Kent, UK

news harvesting

Post by Skittlewidth »

Ah, yes I wasn't clear about that was I. The site doesn't e-mail you everytime. Guardian Online just sent me a confirmation email giving me a personal url from which to harvest my selected topics from. This was just a plain page with the headlines and the first paragraph displayed.

In anycase I've done a simple script now with file().

Code: Select all

<?php

$newsarray = file("http://www.guardian.co.uk/syndication/service/0,11065,331-0-5,00.html?U1271588", "r"); 

foreach ($newsarray as $headline)
&#123;
echo $headline;
&#125;

?>
Sorry for making it out to be more complicated than it was! By the way it seems so much simpler to do it in 3 lines of php than the page of perl offered in their sample script! (though they were trying to do some extra stuff)
User avatar
Skittlewidth
Forum Contributor
Posts: 389
Joined: Wed Nov 06, 2002 9:18 am
Location: Kent, UK

Advanced news harvesting

Post by Skittlewidth »

Ok, So that was the simplest way of doing things, however now I would like to grab the contents of that page and write it to a file on my server so that the Guardian Page doesn't get called every time my page is loaded or refreshed.

I've tried using copy() to just grab the file and write it to my server but this fails to find the file (because it's a URL?).
I'm guessing I might need to do something with fopen() and fwrite() but I'm not sure. :?

Any one got any pointers?
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

Spot on :)

fopen() is your friend!

http://www.zend.com/manual/function.fopen.php
User avatar
Skittlewidth
Forum Contributor
Posts: 389
Joined: Wed Nov 06, 2002 9:18 am
Location: Kent, UK

Post by Skittlewidth »

Right, so what's wrong with this?

Code: Select all

<?php
$filename = "http://www.initialized.co.uk/enter.html";
$fp = fopen($filename, "r+") or die ("could not open file");
$contents = fread ($fp, filesize($filename));
$myfile = "test.txt";
fwrite($myfile, $contents);
fclose($fp);
 ?>
I get the following error which I have narrowed down to the $contents = fread().... line:
Warning: stat failed for http://www.initialized.co.uk/enter.html (errno=2 - No such file or directory) in /home/httpd/html/newstaffintranet/news/index.php on line 5

I'm using a Linux server so a "b" in the fmode() shouldn't be necessary. :?:
User avatar
Skittlewidth
Forum Contributor
Posts: 389
Joined: Wed Nov 06, 2002 9:18 am
Location: Kent, UK

Got it!

Post by Skittlewidth »

Got it now:

Code: Select all

<?php

$filename = "http://www.guardian.co.uk/syndication/service/0,11065,334-0-5,00.html?U1271588";
$fp = fopen($filename, "r+") or die ("could not open file");
$contents = fread ($fp, "4000");
$myfile = "test.txt";
$destfile = fopen($myfile, "r+");

fwrite(fopen($myfile, "r+"), $contents);

fclose($fp);
fclose($destfile);
 
?>
Turns out you can't use filesize() on a url so you have to specify a length.
Also I didn't remember to open the file I was trying to write to!! :oops:
Now to work on the next bit.... :)
Post Reply