Page 1 of 1

Removing HTML from an RSS Feed - Problem

Posted: Sat Apr 25, 2009 2:31 am
by Dave.Arison
Hi

I'm trying to remove html code from an rss feed with no success

the code i used is:

Code: Select all

 
<?php
   
$rss_tags = array(
        'title',
        'link',
        'description',
        'pubDate',
         );
    
    $rss_item_tag = 'item';
    $rss_url = 'http://www.squidoo.com/xml/top_lenses/topic/books';
    
    $rssfeed = rss_to_array($rss_item_tag,$rss_tags,$rss_url);
    
    
 
    foreach ($rssfeed as $key => $item ) {
    
        $desc = $item [description];
        settype($desc, "string");  
        echo $desc ."<br>";
        echo "<b>Stripped:</b></br>";
        $desc1 = strip_tags($desc);
 
        echo $desc1;
    
        echo "</br>=======================================</br>";
        
    }
 
    function rss_to_array($tag, $array, $url) {
        $doc = new DOMdocument();
        $doc->load($url);
        $rss_array = array();
        $items = array();
        foreach($doc->getElementsByTagName($tag) AS $node) {    
            foreach($array AS $key => $value) {
                $items[$value] = $node->getElementsByTagName($value)->item(0)->nodeValue;
            }
            array_push($rss_array, $items);
        }
        return $rss_array;
    }
?>
 
I took for the example the rss feed of squidoo which contain html inside it

The problem is the $desc1 and $desc are the same no tags have been removed at all.

here the the output for the first description:

Code: Select all

 
<p style="border: solid 3px darkred; -moz-border-radius: 15px; -khtml-border-radius: 15px; -webkit-border-radius: 15px; border-radius: 15px; background: Ivory; padding: 10px">The Anita Blake series has become very popular in the last...
 
[b]Stripped:[/b]
 
<p style="border: solid 3px darkred; -moz-border-radius: 15px; -khtml-border-radius: 15px; -webkit-border-radius: 15px; border-radius: 15px; background: Ivory; padding: 10px">The Anita Blake series has become very popular in the last...
=======================================
 
 
I have tried to use the strip_tags in more simple file like this:

Code: Select all

 
<?php 
$mystring = '<p style="border: solid 3px darkred; -moz-border-radius: 15px; -khtml-border-radius: 15px; -webkit-border-radius: 15px; border-radius: 15px; background: Ivory; padding: 10px">The Anita Blake series has become very popular in the last..'; 
echo strip_tags($mystring); 
 
?> 
 
and it's working great !

What seems to be the problem in the first code ?

Thank you in advance

Re: Removing HTML from an RSS Feed

Posted: Sat Apr 25, 2009 2:37 am
by Benjamin
Unless you have posted your code incorrectly, line 6 is halting execution of that file.

Re: Removing HTML from an RSS Feed - Problem

Posted: Sat Apr 25, 2009 2:54 am
by Dave.Arison
Sorry posted a mistake in the code, now fixed.

Do you have any idea ?

Re: Removing HTML from an RSS Feed - Problem

Posted: Sat Apr 25, 2009 2:57 am
by Benjamin
Try changing line 24 to this:

Code: Select all

 
$desc1 = strip_tags(html_entity_decode($desc));
 

Re: Removing HTML from an RSS Feed - Problem

Posted: Sat Apr 25, 2009 3:10 am
by Dave.Arison
Thanx , Work great why haven't I thought about it.

Re: Removing HTML from an RSS Feed - Problem

Posted: Sat Apr 25, 2009 8:49 am
by Dave.Arison
How can I solve the cache problem seem that when I put a new rss it's has the first one in the memory ?