Page 1 of 1
Unwanted characters in output
Posted: Tue May 22, 2007 3:03 pm
by shaneshack
I am new here, so if I have posted this in the wrong category, I apologize. I am running PHP 5.2.0 and MySQL 5.0.41 on a Windows 2003 server running IIS 6.0. I just installed sphider search for a site which I am building. It works great, but at the top of all my search result pages I get these characters: 
I would have blamed sphider, but I also get those same characters on another site I am building when it returns MySQL queries, at the top, right above the results area. Anyone have a clue how to get rid of them? In both cases, the PHP code contains include() or require(). Thanks. I will gladly provide any extra detail you may require.
Posted: Tue May 22, 2007 3:43 pm
by RobertGonzalez
I think this has more to do with your character set than anything. What charset are you using?
Posted: Tue May 22, 2007 4:32 pm
by Chris Corbyn
Looks like BOM from UTF-8. Make sure that if the files are saved as UTF-8, they do not have the BOM on them.
I belive Ambush_Commander on this forum posted a function to remove the BOM if it's there too.
character encoding
Posted: Tue May 22, 2007 5:36 pm
by shaneshack
I changed my encoding to UTF-8, which got rid of the characters, but now an empty white space remains. It isn't a huge deal, but it would be nice to get rid of that white space.
Posted: Tue May 22, 2007 5:38 pm
by RobertGonzalez
I think it is still related to the BOM and that character that UTF-8 adds to the document (I think, but I could be wrong).
character encoding
Posted: Tue May 22, 2007 5:41 pm
by shaneshack
I forgot to mention that I followed Ambush_Commander's instructions for removing the BOM. No success, the white space remained.
Posted: Tue May 22, 2007 6:17 pm
by Ambush Commander
Theoretically speaking the BOM (byte order mark) is a non-breaking zero-width space. Of course, leave to the browsers to render these things incorrectly.
As far as I can tell, Sphider is encoded in ISO-8859-1 (not UTF-8!) so you really shouldn't change the encoding or international characters will get mangled. Three questions:
1. What text editor did you use to edit the templates?
2. What changes to the Sphider code did you make?
3. What exactly did you do when you "removed the BOM"?
Posted: Wed May 23, 2007 9:33 am
by shaneshack
1. What text editor did you use to edit the templates?
I am currently using Microsoft Expression Web as my editor ( . . . please no lectures . . .). To my knowledge using this editor has caused zero problems on any of my projects, albeit, I'm still a PHP novice.
2. What changes to the Sphider code did you make?
I embedded the code from "search_results.html" in a new "search_results.html" which had header and footer sections that matched the rest of the site. I also commented out in search.php the code for including header.html and footer.html in the search.php file. Here is the doctype and head code from the new search_results.html:
Code: Select all
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Language" content="en-us" />
<title>Untitled</title>
<link rel="stylesheet" type="text/css" href="../../../css/sterling.css" />
</head>
3. What exactly did you do when you "removed the BOM"?
The original code was charset=windows-1252. I changed that to utf-8. This removed the unwanted characters, but as I stated earlier, left a empty white line at the very top of search_results.html (where the unwanted characters once were). Probably more accurately there is a single blank character/space at the top left corner of my html document that causes the white line at the top.
This is probably where I have misunderstood: I attempted to insert <?php $text = str_replace("\xEF\xBB\xBF", '', $text); ?> in the head of my document, but no matter where or how I inserted that code, the blank space remains. I hope this has been detailed enough for you.
Posted: Wed May 23, 2007 5:15 pm
by Ambush Commander
I am currently using Microsoft Expression Web as my editor
It likes like MS Expression
automatically adds the BOM. This will only cause problems if the files are encoded in UTF-8, which might be why you're having problems just for this project. Try following the instructions in that link, specifically on 'search_results.html'.
The original code was charset=windows-1252. I changed that to utf-8.
Hopefully that won't cause problems. If special characters are getting mangled, switch it back to windows-1252.
This is probably where I have misunderstood: I attempted to insert <?php $text = str_replace("\xEF\xBB\xBF", '', $text); ?> in the head of my document, but no matter where or how I inserted that code, the blank space remains. I hope this has been detailed enough for you.
Alright. First, I strongly recommend you turn on E_ALL when developing in PHP. My code operates by removing the BOM from the contents of the $text variable. In your case, it's the page itself that has the BOM: $text doesn't exist!
Posted: Wed May 23, 2007 6:03 pm
by shaneshack
Alright. First, I strongly recommend you turn on E_ALL when developing in PHP. My code operates by removing the BOM from the contents of the $text variable. In your case, it's the page itself that has the BOM: $text doesn't exist!
Errr... Sorry about that. Dumb mistake.
I have fixed the white space. I opened up search_results.html in notepad (changed nothing in the code) and re-saved the file in ANSI encoding. I'll just have to remember to edit that particular page in notepad. Thank you all for your help, it's very doubtful I would have figured it out alone.
Posted: Wed May 23, 2007 6:09 pm
by RobertGonzalez
You could always use an editor that doesn't do that to your code.

Posted: Thu May 24, 2007 9:23 am
by shaneshack
You could always use an editor that doesn't do that to your code.
The only reason I use EW is because my sites are on an IIS 6 server and the frontpage extensions make it easy to log in and make changes... The color coding of the code makes it easier on my eyes also.... but you are most correct in principle!
