Google News RSS Feeds Junk Characters ereg_replace FIX?!

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
jared0x90
Forum Newbie
Posts: 4
Joined: Tue Nov 28, 2006 10:52 pm

Google News RSS Feeds Junk Characters ereg_replace FIX?!

Post by jared0x90 »

Hey guys,

I have been using lastRSS for quite some time and for most sites it works beautifully. However for whatever reason Google News's RSS feeds show up pretty funky sometimes. Lots of random characters just mysteriously appear. They are consistently in the same place in there feeds so I'm not sure what's going on. Up to this point I've just kinda shrugged it off and read past them. However its gotten to be annoying so I figured I'd write a function to filter the string lastRSS returns before it echos it out. This works beautifully however it is quite slow and results in loading the machine down for a few moments while it is doing this - obviously this is not the best method by far! I've tried switching to regex but keep messing something up. Every time I think it is working properly it winds up letting all the junk through so I must have had some bad syntax in my filter somewhere. Basically I need the ereg_replace to give the same result as the loop below. Here is what I am using presently:

Code: Select all

$validChars =' ';
$validChars.='?<>.,@\'"/:&#;-_+=';
$validChars.='0123456789';	
$validChars.='ABCDEFGHIJKLMNOPQRSTUVWXYZ';	
$validChars.='abcdefghijklmnopqrstuvwxyz';

function checkString($theString){
	global $validChars;
	$retStr='';
	for ($z=0;$z<strlen($theString);$z++)
		for ($i=0;$i<strlen($validChars);$i++)
			if ($validChars[$i]==$theString[$z])
				$retStr.= $theString[$z];
	return $retStr;
	
/*
	$filtered=ereg_replace("[^a-zA-Z0-9=:[]><\":/.?&#;]", "",  $theString);
	return  $filtered;
	*/
}
Any help/advice would be appreciated - this is my first time trying something that complicated w/ regular expressions (though I'm sure it is comparatively simple to some here!) Thanks!
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Post by GeertDD »

At first sight the problem with your regex is that you didn't escape the closing square bracket that should be matched literal inside the character class.

Try this:

Code: Select all

preg_replace('/[^- ?<>.,@\'"\/:&#;_+=0-9A-Za-z]/', '', $string);
jared0x90
Forum Newbie
Posts: 4
Joined: Tue Nov 28, 2006 10:52 pm

Post by jared0x90 »

Well, that seems to work a lot better. However that brings me to another problem I ran into while playing with RegEx. I get the following link when we filter that way:

Code: Select all

<a href="http://news.google.com/news/url?sa=T&ct=us/7-0&fd=R&url=http://www.sportal.com.au/motorsport.asp3Fi3Dnews26id3D91704&cid=1111552237&ei=v91tRf3hEpSWaJLLzfQM">
Edit : Which doesn't work :)
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Post by GeertDD »

We're stripping out too much characters then. What did the original string/link look like?
jared0x90
Forum Newbie
Posts: 4
Joined: Tue Nov 28, 2006 10:52 pm

Post by jared0x90 »

Here is the link w/o Filtering:

Code: Select all

<a href="http://news.google.com/news/url?sa=T&ct=us/2-0&fd=R&url=http://www.sportal.com.au/motorsport.asp%3Fi%3Dnews%26id%3D91704&cid=1111552237&ei=WORtRdfhIIz8oQL1m7j_DA">
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Post by GeertDD »

Allow % as well.

Code: Select all

preg_replace('/[^- ?<>.,@\'"\/:&#;_+=0-9A-Za-z%]/', '', $string);
jared0x90
Forum Newbie
Posts: 4
Joined: Tue Nov 28, 2006 10:52 pm

Post by jared0x90 »

SUCCESS! You are a genius man - thank you!
Post Reply