XHTML parsing and special entities in attributes

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
horst.horstmann
Forum Newbie
Posts: 1
Joined: Fri Jun 02, 2006 4:38 am

XHTML parsing and special entities in attributes

Post by horst.horstmann »

Hi,

I' m using the standard php 4.4.1 sax parser to process a valid xhtml document.
First my example document:

<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head></head>
<body>
<img src="./product.gif" alt="3,- &euro;" />
</body>
</html>

My parser is just rendering the document "as is" but produces some strange behaviour when encountering a special entity in attribute values (e.g. &euro;).
It seems to me that when the parser reaches the opening <img> element it calls the start-element-handler. To do this all attributs and their values must be parsed first to provide the attributes collection for the start-element-handler.
But on processing the value of the alt-attribute the parser calls the default-handler when it reaches the entity.

This leads to the following result document:
<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head></head>
<body>
&euro;<img src="./product.gif" alt="3,- " />
</body>
</html>

IMHO the doctype definition can't be the problem because the xhtml-special.ent decalration is part of xhtml1-strict.dtd.
Maybe the php parser could be provided with a special handler for this type of element. But the php xml parser api could not give an answer to me.

If anyone has some experience with this kind of problem, please leave a note.

TIA,
horst.horstmann
Post Reply