Probs with character encodings when creating XML feed

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
allyhazell
Forum Newbie
Posts: 9
Joined: Mon Sep 13, 2004 12:43 pm

Probs with character encodings when creating XML feed

Post by allyhazell »

Hi,

I've been struggling to find an answer to my problem, I've tried scouring both the official PHP site and other sites for answers to it. I have created an RSS/XML feed for a client's web site which is automatically generated from the news in a Mysql database each time a new article is added. The problem is that the feed keeps on invalidating with dodgy characters such as (for example) é, ®, ò etc. The only way I've managed to stop this from happening all the time is by doing an str_replace for those characters that come up often with an encoded equivelent è, ® etc. But that only stops it from invalidating for so long before another one is entered that I haven't yet listed.

So my question is, is there a PHP function or a PHP script somewhere that will change these characters into ones that will work with XML? Or am I doing something wrong within the RSS/XML setup itself?

The address of the feed in question is
http://www.medicalnewstoday.com/medicalnews.xml

Thanks in advance for your help

A frustrated Alastair
User avatar
xisle
Forum Contributor
Posts: 249
Joined: Wed Jun 25, 2003 1:53 pm

Post by xisle »

There is a finite set of special characters and their html entities,
so write one function to take care of it throughout your scripts.
Here is an example of replacing some specific nasty MS Word characters with their entities.

Code: Select all

<?php

function superhtmlentities($text) {
 	$entities = array(
 	128 => 'euro', 
 	130 => 'lsquo', 
 	131 => 'fnof', 
 	132 => 'ldquo', 
 	133 => 'hellip', 
 	134 => 'dagger', 
 	135 => 'Dagger', 
 	136 => 'circ', 
 	137 => 'permil', 
 	138 => 'Scaron', 
 	139 => 'lsaquo', 
 	140 => 'OElig', 
 	145 => 'lsquo', 
 	146 => 'rsquo', 
 	147 => 'ldquo', 
 	148 => 'rdquo', 
 	149 => 'bull', 
 	150 => 'ndash', 
 	151 => 'mdash', 
 	152 => 'tilde', 
 	153 => 'trade', 
 	154 => 'scaron', 
 	155 => 'rsaquo', 
 	156 => 'oelig', 
 	159 => 'Yuml');
 	
 	$new_text = '';
 for($i = 0; $i < strlen($text); $i++) {
   	$num = ord($text{$i});
  	if(in_array($num, array_keys($entities))) {
     	$new_text .= '&'.$entities[$num].';';
   	}
   	elseif($num < 127 || $num > 159) {
     	$new_text .= $text{$i};
   	}
 }
 
 return htmlentities($new_text);
}


?>
Here is a page of character sets and their entities..

http://www.w3schools.com/html/html_entitiesref.asp
allyhazell
Forum Newbie
Posts: 9
Joined: Mon Sep 13, 2004 12:43 pm

Post by allyhazell »

That's great, thanks. I shall give it a try. Bloody Word eh!
Post Reply