Page 1 of 1

Question on PHP, XML, and unicode

Posted: Tue Feb 03, 2009 2:50 pm
by csgroce
Okay, this should be easy for a PHP expert with a little free time. This is my first time messing with PHP and I've built a website that uses an XML product catalog containing information of products mostly represented by ASCII characters. However there are a few records with those troublesome accented characters like the French accented e.

Below I've simplified my code:


Line1: define('STORE_XML_FILE', '../catalog.xml');
...
...
/* Reads the XML file */
Line 22: function get_xml_catalog(){
Line 23: return new SimpleXMLElement(file_get_contents(STORE_XML_FILE));
Line 24: }
...
...


This function throws the following exceptions if I try to change the encoding of my catalog.xml file to UTF-16 (within the gedit text editor):

Warning: SimpleXMLElement::__construct() [function.SimpleXMLElement---construct]: Entity: line 1855: parser error : Extra content at the end of the document in /var/www/COM/functions/functions.php on line 23

Warning: SimpleXMLElement::__construct() [function.SimpleXMLElement---construct]: </items> in /var/www/COM/functions/functions.php on line 23

Warning: SimpleXMLElement::__construct() [function.SimpleXMLElement---construct]: ^ in /var/www/COM/functions/functions.php on line 23

Fatal error: Uncaught exception 'Exception' with message 'String could not be parsed as XML' in /var/www/COM/functions/functions.php:23 Stack trace: #0 /var/www/COM/functions/functions.php(23): SimpleXMLElement->__construct('??<???x?m?l? ?v...') #1 /var/www/COM/html/search.php(107): get_xml_catalog() #2 {main} thrown in /var/www/COM/functions/functions.php on line 23


The start of my catalog.xml file is:

<?xml version="1.0" encoding="UTF-16"?>
...
...


I get no errors when I set the encoding back to UTF-8 in gedit.

My guess is that XML documents which use the UTF-16 encoding aren't considered "simple" XML elements in PHP, but I'm not sure. Any ideas would be appreciated. I'd like a function as simple as the above so that 'STORE_XML_FILE' can be iterated as easily as the "SimpleXMLElement".

Thanks!