xml_parse eats < and > from cdata - PHP/libxml bug
Posted: Sun Sep 13, 2009 11:14 am
I am trying to debug a problem with SimplePie (RSS/ATOM feed parser) as used in Joomla! 1.5.14 (latest). Having identical (out of the box) installations on different hosting providers I notice a strange problem with the XML parser (as used in SimplePie). I have absolutely no experience with the XML parser used in PHP by the way.
On some installations xml_parse removes '<' and '>' found in cdata (which is not good when sending the news feed description to the browser). On other installations '<' and '>' are translated to '<' and '>' as expected.
As far as I can see the installations are identical except for the libXML and PHP version numbers (libXML version 2.6.27 (PHP Version 5.2.10) on installations working OK and libXML version 2.7.3 (PHP Version 5.2.8 ) on installitions having problems).
xml_parser_create_ns is used to create the parser (encoding=UTF-8, separator= ' '). OPTION_SKIP_WHITE=1, XML_OPTION_CASE_FOLDING=0.
Here is a detailed example. The input to xml_parse is always the same (extract):
On systems that is working OK, the "character data handler" function (as configured by xml_set_character_data_handler) receives the following cdata fragments (in its second parameter "string $data"):
This yields valid HTML: <p><a href="http://www.packtpub.com/nominate-best-o ... ce-php-cms">
On installations having problems it looks like this:
As can be seen, fewer calls and the '<' and '>' are just gone!
Everything else (Joomla! etc.) works OK by the way...
Any idea why this happens?
On some installations xml_parse removes '<' and '>' found in cdata (which is not good when sending the news feed description to the browser). On other installations '<' and '>' are translated to '<' and '>' as expected.
As far as I can see the installations are identical except for the libXML and PHP version numbers (libXML version 2.6.27 (PHP Version 5.2.10) on installations working OK and libXML version 2.7.3 (PHP Version 5.2.8 ) on installitions having problems).
xml_parser_create_ns is used to create the parser (encoding=UTF-8, separator= ' '). OPTION_SKIP_WHITE=1, XML_OPTION_CASE_FOLDING=0.
Here is a detailed example. The input to xml_parse is always the same (extract):
Code: Select all
<description><p><a href="http://www.packtpub.com/nominate-best-open-source-php-cms"> .... Code: Select all
(SimplePie_Parser::tag_open tag: description - attributes: a:0:{})
SimplePie_Parser::cdata: '<'
SimplePie_Parser::cdata: 'p'
SimplePie_Parser::cdata: '>'
SimplePie_Parser::cdata: '<'
SimplePie_Parser::cdata: 'a href="http://www.packtpub.com/nominate-best-open-source-php-cms"'
SimplePie_Parser::cdata: '>'On installations having problems it looks like this:
Code: Select all
(SimplePie_Parser::tag_open tag: description - attributes: a:0:{})
SimplePie_Parser::cdata: 'p'
SimplePie_Parser::cdata: 'a href="http://www.packtpub.com/nominate-best-open-source-php-cms"'Everything else (Joomla! etc.) works OK by the way...
Any idea why this happens?