decoding XML problems

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
dhobo
Forum Newbie
Posts: 2
Joined: Thu Sep 15, 2011 8:56 am

decoding XML problems

Post by dhobo »

Hi all,

I have some problems with decoding XML, I have the following XML:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<advastamedia xmlns="http://www.advasta.com/XMLSchema/5.02" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.advasta.com/XMLSchema/5.02 http://www.advasta.com/XMLSchema/5.02.30/advastamedia502.xsd" created="2011-04-27T11:48:50.000+01:00" version="5.02" language="de-DE" decimalseparator="," groupseparator=".">
<-- skipped a part here -->
<textvalue pos="1">
              <data>Exklusiv bei Mein – „ES“ Griffdesign für komfortables Aufhängen des Werkzeugs</data>
            </textvalue>
 <textvalue pos="7">
              <data>Große Gras- und Strauchscherenklingen</data>
            </textvalue>
as you can see, there are some special characters in both <data> tags, and when I use my PHP code:

Code: Select all

// start importing
$xml_import  = simplexml_load_file($sFileName);
$namespaces = array_merge(array('' => ''), $xml_import->getDocNamespaces(true));
$myArray = array();
$myArray = xml2phpArray($xml_import, $namespaces, $myArray);
echo "<pre>";
var_dump($myArray);
echo "</pre>";

// xml2phpArray function
// converts xml 2 php array
function xml2phpArray($xml, $namespaces, $arr) {
     $iter = 0;

    foreach ($namespaces as $namespace => $namespaceUrl) {
         foreach ($xml->children($namespaceUrl) as $b) {
             $a = $b->getName();

            if ($b->children($namespaceUrl)) {
                 $arr[$a][$iter] = array();
                 $arr[$a][$iter] = xml2phpArray($b, $namespaces, $arr[$a][$iter]);
             }
             else {
                 $arr[$a] = trim($b[0]);
             }

            $iter++;
         }
     }

    return $arr;
}
then my output for the both data tags is like this:

Exklusiv bei Mein – „ES“ Griffdesign für komfortables Aufhängen des Werkzeugs
Große Gras- und Strauchscherenklingen


as you can see, all special characters are looking weird.

If I change this line of code in PHP from:

Code: Select all

$arr[$a] = trim($b[0]);
to:

Code: Select all

$arr[$a] = trim(utf8_decode($b[0]));
then some characters are being decoded as they should, but other characters turn into a '?' :

Exklusiv bei Mein ? ?ES? Griffdesign für komfortables Aufhängen des Werkzeugs
Große Gras- und Strauchscherenklingen


how can I fix this, that ALL special characters are being shown properly ?

thanks for your help!
greip
Forum Commoner
Posts: 39
Joined: Tue Aug 23, 2011 8:23 am
Location: Oslo, Norway

Re: decoding XML problems

Post by greip »

The character encoding used in your XML document is UTF-8. This can be seen from the encoding attribute of the XML declaration. UTF-8 represent most special characters as a sequence of two or more bytes and represent the characters of the traditional ASCII character set using the same single byte values as ASCII.

The traditional PHP string data type use single byte values to represent text. utf8_decode() will simply throw away any UTF-8 characters that can't be represented as single byte values. That's why you get the question marks in your text.

In order to support the extended character set supported by UTF-8 you need to use the PHP multibyte string functions. Have a look at the manual here: http://php.net/manual/en/book.mbstring.php. There is no mb_trim(), but you'll find various user contributed variants of that function.

When you generate your web page, make sure that you declare that the character encoding is UTF-8 using the Content-Type HTTP header and the Content-Type HTML META tag. The default character encoding for web pages is ISO-8859-1.
dhobo
Forum Newbie
Posts: 2
Joined: Thu Sep 15, 2011 8:56 am

Re: decoding XML problems

Post by dhobo »

Thank you very much, I enabled the mbstring.func_overload in my php.ini and used the Content-Type tags to make sure that I am declaring UTF-8 in my header, it works perfectly!
Post Reply