Page 1 of 1
News harvesting
Posted: Wed Mar 05, 2003 5:40 am
by Skittlewidth
I've just subscribed to the Guardian Headline Newsfeed service
http://www.guardian.co.uk/headlineservi ... 35,00.html but am now having problems actually setting it up on the webpage it needs to go on. They recommend you use VBscript or Perl to harvest the news off the personalised url they send you, but i have no experience in either of these. How would I go about doing it in PHP, preserving the urls so that the users can still jump to the full article?
Anyone else used this service?

Posted: Wed Mar 05, 2003 5:59 am
by patrikG
Hmm...I am no expert on mail-servers etc.
However, if it's news you're after, have a look at
http://www.moreover.com
They used to have a free section, don't know if it still exists. I think they even offered scripts to pull the links from their website (but it wasn't emailed, hence it's different to your problem).
The mother of all news-indexes is:
http://searchenginewatch.com/links/news.html
news harvesting
Posted: Wed Mar 05, 2003 6:14 am
by Skittlewidth
Ah, yes I wasn't clear about that was I. The site doesn't e-mail you everytime. Guardian Online just sent me a confirmation email giving me a personal url from which to harvest my selected topics from. This was just a plain page with the headlines and the first paragraph displayed.
In anycase I've done a simple script now with file().
Code: Select all
<?php
$newsarray = file("http://www.guardian.co.uk/syndication/service/0,11065,331-0-5,00.html?U1271588", "r");
foreach ($newsarray as $headline)
{
echo $headline;
}
?>
Sorry for making it out to be more complicated than it was! By the way it seems so much simpler to do it in 3 lines of php than the page of perl offered in their sample script! (though they were trying to do some extra stuff)
Advanced news harvesting
Posted: Thu Mar 06, 2003 10:19 am
by Skittlewidth
Ok, So that was the simplest way of doing things, however now I would like to grab the contents of that page and write it to a file on my server so that the Guardian Page doesn't get called every time my page is loaded or refreshed.
I've tried using copy() to just grab the file and write it to my server but this fails to find the file (because it's a URL?).
I'm guessing I might need to do something with fopen() and fwrite() but I'm not sure.
Any one got any pointers?
Posted: Thu Mar 06, 2003 10:24 am
by patrikG
Posted: Fri Mar 07, 2003 3:32 am
by Skittlewidth
Right, so what's wrong with this?
Code: Select all
<?php
$filename = "http://www.initialized.co.uk/enter.html";
$fp = fopen($filename, "r+") or die ("could not open file");
$contents = fread ($fp, filesize($filename));
$myfile = "test.txt";
fwrite($myfile, $contents);
fclose($fp);
?>
I get the following error which I have narrowed down to the $contents = fread().... line:
Warning: stat failed for
http://www.initialized.co.uk/enter.html (errno=2 - No such file or directory) in /home/httpd/html/newstaffintranet/news/index.php on line 5
I'm using a Linux server so a "b" in the fmode() shouldn't be necessary.

Got it!
Posted: Fri Mar 07, 2003 5:15 am
by Skittlewidth
Got it now:
Code: Select all
<?php
$filename = "http://www.guardian.co.uk/syndication/service/0,11065,334-0-5,00.html?U1271588";
$fp = fopen($filename, "r+") or die ("could not open file");
$contents = fread ($fp, "4000");
$myfile = "test.txt";
$destfile = fopen($myfile, "r+");
fwrite(fopen($myfile, "r+"), $contents);
fclose($fp);
fclose($destfile);
?>
Turns out you can't use filesize() on a url so you have to specify a length.
Also I didn't remember to open the file I was trying to write to!!
Now to work on the next bit....
