Page 1 of 1

Google News RSS Feeds Junk Characters ereg_replace FIX?!

Posted: Tue Nov 28, 2006 11:04 pm
by jared0x90
Hey guys,

I have been using lastRSS for quite some time and for most sites it works beautifully. However for whatever reason Google News's RSS feeds show up pretty funky sometimes. Lots of random characters just mysteriously appear. They are consistently in the same place in there feeds so I'm not sure what's going on. Up to this point I've just kinda shrugged it off and read past them. However its gotten to be annoying so I figured I'd write a function to filter the string lastRSS returns before it echos it out. This works beautifully however it is quite slow and results in loading the machine down for a few moments while it is doing this - obviously this is not the best method by far! I've tried switching to regex but keep messing something up. Every time I think it is working properly it winds up letting all the junk through so I must have had some bad syntax in my filter somewhere. Basically I need the ereg_replace to give the same result as the loop below. Here is what I am using presently:

Code: Select all

$validChars =' ';
$validChars.='?<>.,@\'"/:&#;-_+=';
$validChars.='0123456789';	
$validChars.='ABCDEFGHIJKLMNOPQRSTUVWXYZ';	
$validChars.='abcdefghijklmnopqrstuvwxyz';

function checkString($theString){
	global $validChars;
	$retStr='';
	for ($z=0;$z<strlen($theString);$z++)
		for ($i=0;$i<strlen($validChars);$i++)
			if ($validChars[$i]==$theString[$z])
				$retStr.= $theString[$z];
	return $retStr;
	
/*
	$filtered=ereg_replace("[^a-zA-Z0-9=:[]><\":/.?&#;]", "",  $theString);
	return  $filtered;
	*/
}
Any help/advice would be appreciated - this is my first time trying something that complicated w/ regular expressions (though I'm sure it is comparatively simple to some here!) Thanks!

Posted: Wed Nov 29, 2006 1:15 pm
by GeertDD
At first sight the problem with your regex is that you didn't escape the closing square bracket that should be matched literal inside the character class.

Try this:

Code: Select all

preg_replace('/[^- ?<>.,@\'"\/:&#;_+=0-9A-Za-z]/', '', $string);

Posted: Wed Nov 29, 2006 1:23 pm
by jared0x90
Well, that seems to work a lot better. However that brings me to another problem I ran into while playing with RegEx. I get the following link when we filter that way:

Code: Select all

<a href="http://news.google.com/news/url?sa=T&ct=us/7-0&fd=R&url=http://www.sportal.com.au/motorsport.asp3Fi3Dnews26id3D91704&cid=1111552237&ei=v91tRf3hEpSWaJLLzfQM">
Edit : Which doesn't work :)

Posted: Wed Nov 29, 2006 1:32 pm
by GeertDD
We're stripping out too much characters then. What did the original string/link look like?

Posted: Wed Nov 29, 2006 1:51 pm
by jared0x90
Here is the link w/o Filtering:

Code: Select all

<a href="http://news.google.com/news/url?sa=T&ct=us/2-0&fd=R&url=http://www.sportal.com.au/motorsport.asp%3Fi%3Dnews%26id%3D91704&cid=1111552237&ei=WORtRdfhIIz8oQL1m7j_DA">

Posted: Wed Nov 29, 2006 2:35 pm
by GeertDD
Allow % as well.

Code: Select all

preg_replace('/[^- ?<>.,@\'"\/:&#;_+=0-9A-Za-z%]/', '', $string);

Posted: Wed Nov 29, 2006 2:39 pm
by jared0x90
SUCCESS! You are a genius man - thank you!