RSS parsing creating invalid XML

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
jwalsh
Forum Contributor
Posts: 202
Joined: Sat Jan 03, 2004 4:55 pm
Location: Cleveland, OH

RSS parsing creating invalid XML

Post by jwalsh »

Hi,

I'm parsing an XML feed into our generic formatting codes to allow us to format syndicated articles into our current web layout. The content is coming from our direct partners, but occasionally my code creates invalid XML.

Here's the XML error I'm getting.

Code: Select all

XML Parsing Error: undefined entity

to the crowd, �I'm celebrating
--------------^
I thought since I was using htmlentities, it would create valid XML. Here's the important part of my code.

Code: Select all

function reverse_htmlentities($mixed) {
   $htmltable = get_html_translation_table(HTML_ENTITIES);
   foreach($htmltable as $key => $value)
   {
       $mixed = ereg_replace(addslashes($value),$key,$mixed);
   }
   return $mixed;
}

function FormatCode($document) {
	// UNDO HTMLENTITIES SO THAT WE CAN PROPERLY PARSE IMG's AND LINKS
	$document = reverse_htmlentities($document);
	
    // REPLACE CERTAIN HTML TAGS WITH OUR FORMATTING SCHEMA
	$search = array ('/\<strong\>(.*?)\<\/strong\>/is',
		'/\<i\>(.*?)\<\/i\>/is',
		'/\<u\>(.*?)\<\/u\>/is',
		'/\<a href=(.*?) target=_blank\>(.*?)\<\/a\>/is',
		'/\<img (.*?) src=\"(.*?)\" (.*?)\>/e');
		
	$replace = array('{b}$1{/b}',
		'{i}$1{/i}',
		'{u}$1{/u}',
		'{link src=$1}$2{/link}',
		"addtoimage('\\2')");
	
	$text = preg_replace($search, $replace, $document);
	
    // TURN <BR> INTO NEW LINES
	$text = str_replace("<br>", "\n", $text);
	$text = str_replace("<br />", "\n", $text);
	
    // REMOVE EXCESS HTML TAGS
	$text = strip_tags($text);
	
    // REDO HTML ENTITIES FOR VALID XML
	$text = htmlentities($text);
	
	return $text;
}

// COMING FROM AN XML PARSING LOOP
echo FormatCode($article->description);
Post Reply