Unusual "smart quote" problem
Posted: Sat Apr 10, 2004 7:31 am
Hi,
I am developing software for a new website and have come across an unusual problem that I cannot explain/fix. I am running two servers with Apache 2.0.46-32 and php 4.3.2-8. Apache configurations are somewhat different between them but php is configured identically.
The software has a feature that works like this. Users fill out a form, it puts the data into an e-mail, and sends it to me. Theoretically simple. Unfortunately, some of the users are writing text in Microsoft Word and then cutting/pasting it into the form blocks. The MS Word smart quotes (both single and double) play havoc on the feature and result in me getting several characters of garbage where the smartquote should be.
Now here's the kicker: What happens now depends on which of my servers is being used. One of the servers returns the message beautifully with no smartquotes. The other return garbage as explained above. Both servers are sending e-mail to the same account and are being read on the same computer.
In an attempt solve this problem, I poked around online and found a simple script to switch the smartquotes back to regular quotes.
The function looks like this:
The result? The "good" server replaces the smartquotes with regular quotse and outputs them correctly. The "bad" server fails to make the character replacements and continues to send garbage.
I spent much of last night looking for reasons why this is happening. My research has resulted in a number of findings/discoveries/knowledge/etc.
1 - The php.ini files are configured identically, so that ain't it. Neither assign a character set, so the default character set becomes the iso-8859-1 by default.
2 - Strangely, php.net's online documentation says that chr(#) returns the ASCII value of num. ASCII values for 148-151 are not smart quotes though. ISO-8859-1 does not have characters 128 - 150-something, as HTML does not allow these values to be defined. The UTF-8 character set , however, does assign 148-151 as the values for the four types of smart quotes.
3 - If I print chr(148), chr(149), etc. to the screen on the "good" server, it shows the smartquotes. If I print them to the screen on the bad server it prints question marks.
4 - Since the find/replace function is working fine on the "good" server, it is likely failing on the bad server because it is looking for the incorrect characters returned by the chr() commands.
With these facts in mind, my conclusion is that php is using its regular default character set, ISO-8859-1. I have no idea how it is assigning the values for 128 to 150-whatever though. If I can figure out how it is assigning the values, I may be able to rectify my problem.
Any suggestions on how to solve this from here? Or is there another explanation I have overlooked?
Thanks,
--Aaron
I am developing software for a new website and have come across an unusual problem that I cannot explain/fix. I am running two servers with Apache 2.0.46-32 and php 4.3.2-8. Apache configurations are somewhat different between them but php is configured identically.
The software has a feature that works like this. Users fill out a form, it puts the data into an e-mail, and sends it to me. Theoretically simple. Unfortunately, some of the users are writing text in Microsoft Word and then cutting/pasting it into the form blocks. The MS Word smart quotes (both single and double) play havoc on the feature and result in me getting several characters of garbage where the smartquote should be.
Now here's the kicker: What happens now depends on which of my servers is being used. One of the servers returns the message beautifully with no smartquotes. The other return garbage as explained above. Both servers are sending e-mail to the same account and are being read on the same computer.
In an attempt solve this problem, I poked around online and found a simple script to switch the smartquotes back to regular quotes.
The function looks like this:
Code: Select all
// Transform smart quotes into regular quotes
function fixmstext ($text)
{
$badwordchars=array(
chr(145),
chr(146),
chr(147),
chr(148),
);
$fixedwordchars=array(
"'",
"'",
'"',
'"'
);
$test = str_replace($badwordchars,$fixedwordchars,$text);
return str_replace($badwordchars,$fixedwordchars,$text);
}
// Call the transform function
$message = strip_tags(stripslashes($message));I spent much of last night looking for reasons why this is happening. My research has resulted in a number of findings/discoveries/knowledge/etc.
1 - The php.ini files are configured identically, so that ain't it. Neither assign a character set, so the default character set becomes the iso-8859-1 by default.
2 - Strangely, php.net's online documentation says that chr(#) returns the ASCII value of num. ASCII values for 148-151 are not smart quotes though. ISO-8859-1 does not have characters 128 - 150-something, as HTML does not allow these values to be defined. The UTF-8 character set , however, does assign 148-151 as the values for the four types of smart quotes.
3 - If I print chr(148), chr(149), etc. to the screen on the "good" server, it shows the smartquotes. If I print them to the screen on the bad server it prints question marks.
4 - Since the find/replace function is working fine on the "good" server, it is likely failing on the bad server because it is looking for the incorrect characters returned by the chr() commands.
With these facts in mind, my conclusion is that php is using its regular default character set, ISO-8859-1. I have no idea how it is assigning the values for 128 to 150-whatever though. If I can figure out how it is assigning the values, I may be able to rectify my problem.
Any suggestions on how to solve this from here? Or is there another explanation I have overlooked?
Thanks,
--Aaron