simplexml weird characters

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

simplexml weird characters

Post by SidewinderX »

I am trying to parse an XML document. I am using the following code (I only want the first entry).

Code: Select all

<?php
$xml = new SimpleXMLElement('http://www.example.com/feed/', null, true);
echo $xml->channel->item->title[0];
?>


On the feed itself it looks fine, but the above code outputs (notice the odd characters) :

Code: Select all

LOL Monday – Selective Luddite

I thought I had this issue before and the solution was to use utf-8 character encoding, so I tried specifying it using both an html doctype and header() - neither worked. When I view the source of the XML document it has:

Code: Select all

<title>LOL Monday &#8211; Selective Luddite</title>
I'm not sure what to do?
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: simplexml weird characters

Post by requinix »

If the RSS uses entities (like &#8211;) instead of the characters they represent (like –) then its encoding doesn't matter.

Now the HTML page's encoding does. SimpleXML probably converted the numeric entity to the character (in UTF-8) so when you tried to display it (in a non-UTF-8 page) you saw mojibake.

Code: Select all

<?php
$xml = new SimpleXMLElement('http://www.example.com/feed/', null, true);
header("Content-Type: text/html; charset=utf-8");
echo $xml->channel->item->title[0];
?>
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

Re: simplexml weird characters

Post by SidewinderX »

That was the first thing I tried, interestingly enough, it works now. I must have typed something wrong. Thanks. :D
Post Reply