Page 1 of 1

DOM XML adding extra characters when processing xslt?

Posted: Sat Apr 23, 2005 4:27 pm
by Sinemacula
I'm very new to this, and trying to get a package developed by someone else to work the way I want it to... so please bear with me. :)

My understanding is that DOM XML processes an XML file and presents the output according to an XSL template. The XML file includes the result of a sql query, and has some items from the database that have ™ included in them. When that file is processed, the output has extra characters.

For example:

In the database, I have contents of a field that are "This New Thing™". I have verified that the XML file that is produced by the query shows that data as "This New Thing™". However, after processing the XML file according to the XSL template, the result that ends up on my webpage is "This New Thing™".

Is there an easy way to fix this? Can I just add some code to the XSL template? Or is it a lot more complicated than that?

Thanks,
Scott

Posted: Sun Apr 24, 2005 1:57 am
by Sinemacula
I've discovered some more info... but still don't have a fix or workaround...

It seems that part of the problem is that since ™ is not ascii, when it is output it is being converted into two bytes, and  is the second byte.

Here's a quote regarding a similar issue that was posted on the php bugs list a couple of years ago:
Remember that echo's output is ascii. The £ symbol is not ascii. Since
it's not it is expanded to two bytes. Â is that second byte. A true
test would be to parse it and output the string that is retrieved. For
a list of what is ascii, and therefore is unchanged when output to UTF-8
see http://www.mindspring.com/~jc1/serial/R ... ASCII.html
So, now I know what's going on... anyone know how to "fix" it or create a workaround?

Posted: Sun Apr 24, 2005 11:19 am
by Sinemacula
Some more information...

If I put "& # 1 5 3 ;" (without spaces) in the body of the xsl file, it is output as ™ - just as when ™ is in the original XML file. However, if I change the database entry from which the XML file gets its data from ™ to "& # 1 5 3 ;" - then the output is the proper ™.

So, now I'm back to being more confused, and still without a solution (since changing the database entry messes up a different part of my site).

Posted: Sun Apr 24, 2005 5:34 pm
by Sinemacula
In case anyone reading this is having a similar problem and is hoping to find a solution here, I thought I'd finish up with the fix that worked...

Changing the output tag in the xsl template to include encoding that includes the ™ (and other non-ascii characters). So, now, instead of

Code: Select all

<xsl:output method=&quote;html&quote; />
I've got

Code: Select all

<xsl:output method=&quote;html&quote; encoding=&quote;windows-1252&quote; />
And it works just fine.