I have a problem with storing special characters in XML.
Until PHP gets native support for UTF-8 (no need to use mb_* functions), I will be using ISO-8859-1. Unfortunately today I found out, that non-compatible characters corrupts my XML files (ie the Euro sign (€)) - example:
Code: Select all
$xml = new DOMDocument();
$xml->loadXML("<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?><root></root>");
$element = $xml->createElement("data");
$newAttribute = $xml->createAttribute("name");
$newTextNode = $xml->createTextNode(htmlentities("€"));
$newAttribute->appendChild($newTextNode);
$element->appendChild($newAttribute);
$xml->documentElement->appendChild($element);
echo $xml->saveXML(); // XML is corrupted, caused by the Euro sign
Code: Select all
<?xml version="1.0" encoding="ISO-8859-1"?>
<root><data name="€" /></root>
Code: Select all
<?xml version="1.0" encoding="ISO-8859-1"?>
<root><data name="â
Strangely enough, I still get corrupted XML. I have to change the XML file to UTF-8 for it to work, and I'm not interested in that.
Even though I succeed and get the euro sign to work, I will still get problems with a lot of other characters from the UTF-8 charset, so I figured detecting the encoding was the proper solution, and simply thrown an exception, if invalid data was being "submitted" to the XML.
I tried mb_detect_encoding(..) to determine the encoding of the data, but it seems to be very buggy.
Code: Select all
echo mb_detect_encoding("Here is a € (euro) sign", "ISO-8859-1, ISO-8859-15, UTF-8");
At this point I would very much appreciate any input, that can help me solve this problem. Again, I only want to save data as ISO-8859-1. htmlentities should take care of encoding ie the euro sign to entities that are compatible with ISO-8859-1, and I would like to be able to simply throw an exception, if invalid data is submitted to the XML file.
Thanks in advance! :-)