Unwanted characters in output

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
shaneshack
Forum Newbie
Posts: 7
Joined: Tue May 22, 2007 2:50 pm
Location: Sterling, CO

Unwanted characters in output

Post by shaneshack »

I am new here, so if I have posted this in the wrong category, I apologize. I am running PHP 5.2.0 and MySQL 5.0.41 on a Windows 2003 server running IIS 6.0. I just installed sphider search for a site which I am building. It works great, but at the top of all my search result pages I get these characters: 

I would have blamed sphider, but I also get those same characters on another site I am building when it returns MySQL queries, at the top, right above the results area. Anyone have a clue how to get rid of them? In both cases, the PHP code contains include() or require(). Thanks. I will gladly provide any extra detail you may require.
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

I think this has more to do with your character set than anything. What charset are you using?
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Looks like BOM from UTF-8. Make sure that if the files are saved as UTF-8, they do not have the BOM on them.

I belive Ambush_Commander on this forum posted a function to remove the BOM if it's there too.
User avatar
shaneshack
Forum Newbie
Posts: 7
Joined: Tue May 22, 2007 2:50 pm
Location: Sterling, CO

character encoding

Post by shaneshack »

I changed my encoding to UTF-8, which got rid of the characters, but now an empty white space remains. It isn't a huge deal, but it would be nice to get rid of that white space.
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

I think it is still related to the BOM and that character that UTF-8 adds to the document (I think, but I could be wrong).
User avatar
shaneshack
Forum Newbie
Posts: 7
Joined: Tue May 22, 2007 2:50 pm
Location: Sterling, CO

character encoding

Post by shaneshack »

I forgot to mention that I followed Ambush_Commander's instructions for removing the BOM. No success, the white space remained.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Theoretically speaking the BOM (byte order mark) is a non-breaking zero-width space. Of course, leave to the browsers to render these things incorrectly. :roll:

As far as I can tell, Sphider is encoded in ISO-8859-1 (not UTF-8!) so you really shouldn't change the encoding or international characters will get mangled. Three questions:

1. What text editor did you use to edit the templates?
2. What changes to the Sphider code did you make?
3. What exactly did you do when you "removed the BOM"?
User avatar
shaneshack
Forum Newbie
Posts: 7
Joined: Tue May 22, 2007 2:50 pm
Location: Sterling, CO

Post by shaneshack »

1. What text editor did you use to edit the templates?
I am currently using Microsoft Expression Web as my editor ( . . . please no lectures . . .). To my knowledge using this editor has caused zero problems on any of my projects, albeit, I'm still a PHP novice.
2. What changes to the Sphider code did you make?
I embedded the code from "search_results.html" in a new "search_results.html" which had header and footer sections that matched the rest of the site. I also commented out in search.php the code for including header.html and footer.html in the search.php file. Here is the doctype and head code from the new search_results.html:

Code: Select all

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Language" content="en-us" />
<title>Untitled</title>

<link rel="stylesheet" type="text/css" href="../../../css/sterling.css" />

</head>
3. What exactly did you do when you "removed the BOM"?
The original code was charset=windows-1252. I changed that to utf-8. This removed the unwanted characters, but as I stated earlier, left a empty white line at the very top of search_results.html (where the unwanted characters once were). Probably more accurately there is a single blank character/space at the top left corner of my html document that causes the white line at the top.

This is probably where I have misunderstood: I attempted to insert <?php $text = str_replace("\xEF\xBB\xBF", '', $text); ?> in the head of my document, but no matter where or how I inserted that code, the blank space remains. I hope this has been detailed enough for you.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

I am currently using Microsoft Expression Web as my editor
It likes like MS Expression automatically adds the BOM. This will only cause problems if the files are encoded in UTF-8, which might be why you're having problems just for this project. Try following the instructions in that link, specifically on 'search_results.html'.
The original code was charset=windows-1252. I changed that to utf-8.
Hopefully that won't cause problems. If special characters are getting mangled, switch it back to windows-1252.
This is probably where I have misunderstood: I attempted to insert <?php $text = str_replace("\xEF\xBB\xBF", '', $text); ?> in the head of my document, but no matter where or how I inserted that code, the blank space remains. I hope this has been detailed enough for you.
Alright. First, I strongly recommend you turn on E_ALL when developing in PHP. My code operates by removing the BOM from the contents of the $text variable. In your case, it's the page itself that has the BOM: $text doesn't exist!
User avatar
shaneshack
Forum Newbie
Posts: 7
Joined: Tue May 22, 2007 2:50 pm
Location: Sterling, CO

Post by shaneshack »

Alright. First, I strongly recommend you turn on E_ALL when developing in PHP. My code operates by removing the BOM from the contents of the $text variable. In your case, it's the page itself that has the BOM: $text doesn't exist!
Errr... Sorry about that. Dumb mistake.

I have fixed the white space. I opened up search_results.html in notepad (changed nothing in the code) and re-saved the file in ANSI encoding. I'll just have to remember to edit that particular page in notepad. Thank you all for your help, it's very doubtful I would have figured it out alone.
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

You could always use an editor that doesn't do that to your code. :wink:
User avatar
shaneshack
Forum Newbie
Posts: 7
Joined: Tue May 22, 2007 2:50 pm
Location: Sterling, CO

Post by shaneshack »

You could always use an editor that doesn't do that to your code.
The only reason I use EW is because my sites are on an IIS 6 server and the frontpage extensions make it easy to log in and make changes... The color coding of the code makes it easier on my eyes also.... but you are most correct in principle! :)
Post Reply