Page 1 of 1

problem with unknown characters in rss feed

Posted: Thu Jun 09, 2011 3:58 pm
by mathruD
i'm having a problem with unknown characters showing up in my rss feed. the problem appears to be the quote symbol ("), as well as a black diamond with a ? inside it. for instance, one title might show up like this:

?Cure for the Common Font? ? A Web Designer?s Introduction to Typeface�Selection (this is the way it is displaying)
Cure for the Common Font? A Web Designer's Introduction to Typeface Selection (this is how it should display)

for some reason, the (") symbol is showing up as a (?). i can use the str_replace function to replace the (?) with a ("), however, if there is a legit (?) in the title, as in the example above, it also gets changed to a ("). as far as the black diamond goes, i have no clue what is doing that so i don't know where to begin trying to replace it.

my code is currently as follows:

Code: Select all

<?php
$search = array("?","\n", "\r\n", "&#10", "&#09", "%09", "%20", "\0");
$replace = array('"',"", "", "", "&nbsp; &nbsp; ", ",", " ", "");
 ?>

<?php do { ?>

<div class="resourcesFeedCntr">
    <div class="resourcesFeedTitle"><a href="<?php echo $row_resourceFeed_rs['resource_link']; ?>"><?php echo $row_resourceFeed_rs['resource_title']; ?></a></div>
    
    <div class="resourcesFeedContent">
      
      <?php include('RSS/rss_fetch.inc');
	  
	  $rss = fetch_rss($row_resourceFeed_rs['resource_rssLink']);
// Split the array to show first 8 listings
$items = array_slice($rss->items, 0, 8);
// Cycle through and display the listings
foreach ($items as $item )
{  ?>
      
      <li><a href="<?php echo $item['link']; ?>"><?php echo str_replace($search,$replace,$item['title']); ?></a></li>

<?php } while ($row_resourceFeed_rs = mysql_fetch_assoc($resourceFeed_rs)); ?>
Can someone please give me an idea as to how to correct this problem? also, charset is set to UTF-8.

Re: problem with unknown characters in rss feed

Posted: Thu Jun 09, 2011 4:04 pm
by pickle
Is the charset of the RSS feed the same as the charset of the page you're viewing it in?

Re: problem with unknown characters in rss feed

Posted: Thu Jun 09, 2011 5:03 pm
by mathruD
i just took a look at the source code for the rss feed that is causing the problem and the first line says:

<?xml version="1.0" encoding="UTF-8"?>

my php page is coded using xhtml 1.0 transitional. would that have anything to do with it?
also, here is a link to the actual rss feed that i am pulling from. it is the one that is causing the probem:
http://feeds2.feedburner.com/typographica

i've also tried using code along these lines that i found online when i was searching for a solution to the problem, but none of them worked (i tried them all separately).

// if your input encoding is ISO 8859-1
htmlspecialchars(utf8_encode($string), ENT_QUOTES)

// if your input encoding is UTF-8
htmlspecialchars($string, ENT_QUOTES, 'UTF-8')

$output = htmlentities(utf8_encode($source));

Re: problem with unknown characters in rss feed

Posted: Thu Jun 09, 2011 6:24 pm
by flying_circus
mathruD wrote:here is a link to the actual rss feed that i am pulling from. it is the one that is causing the probem:
http://feeds2.feedburner.com/typographica
<META http-equiv="Content-Type" content="text/html; charset=UTF-16">

Re: problem with unknown characters in rss feed

Posted: Thu Jun 09, 2011 10:32 pm
by mathruD
if you don't mind, can you explain where you are seeing the line that establishes the charset as UTF-16? when i view the source code for that link, it shows up as <?xml version="1.0" encoding="UTF-8"?>

also, what would i have to do to convert it to utf-8 to display properly?

Re: problem with unknown characters in rss feed

Posted: Fri Jun 10, 2011 12:48 am
by flying_circus
mathruD wrote:if you don't mind, can you explain where you are seeing the line that establishes the charset as UTF-16? when i view the source code for that link, it shows up as <?xml version="1.0" encoding="UTF-8"?>

also, what would i have to do to convert it to utf-8 to display properly?
Interesting. I posted the above from work, and I believe I have IE9 installed there. At home, I am running IE8 and also see UTF-8. There is something weird though, because when you right click the page and go to encoding, it's all greyed out and "unicode" is selected, not "Unicode (UTF-8)".

You can look into PHP's mbstring extension. specifically mb_check_encoding() and mb_convert_encoding()

Re: problem with unknown characters in rss feed

Posted: Fri Jun 10, 2011 1:44 am
by mathruD
ok. i'll look into it and see if i can get it working.