Page 1 of 1

Parse RSS feeds to MySQL

Posted: Fri Mar 28, 2008 2:00 pm
by csdavis0906
~pickle | Please use [ code=html ], [ code=php ], etc tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: :arrow: Posting Code in the Forums to learn how to do it too.


I am using the code below to parse a series of rss feeds into MySQL with each record storing the data for a single rss item including title, description, link... In addition, the titles and descriptions are stripped of whitespace and html tags.

The problem I'm having is that 10% or so of my INSERTS are failing and not stored to the db?? If anyone has some insights or recommendations, please let me know. Thanking you in advance...

Code: Select all

function parseRSS() {
    
    // load xml of rss feeds 
    if (!$myxml = simplexml_load_file('rss.xml')){
        echo 'Error reading the XML file';
    }
    $titles = 0;  // initialize var to count total number of articles loaded
    $failed = 0;  // initialize var to count failed inserts
    
    // process each feed
    foreach($myxml as $feed){
        
        $source   = $feed->id;                    // news source id 
        $category = $feed->cat;                 // news category 
        $feed     = 'xml/' . $feed->id . '.xml';   // name locally saved file
        
        if (!$feedxml = simplexml_load_file($feed)){ 
           echo 'Error reading the RSS feed';
        }
        // loop through each feed by article
        foreach($feedxml->xpath('//item') as $item) {
            
            // remove html from article description
            $newtitle = stripHTML($item->title);
            // strip title of non-printing char's
            $newtitle = preg_replace("/[^a-zA-Z0-9.\-' ']/", "", $newtitle);
            // trim leading whitespaces
            $newtitle = ltrim($newtitle);
            
            // parse published date/time
            $month    = substr($item->pubDate, 8, 3);
            $day      = substr($item->pubDate, 5, 2);
            $year     = substr($item->pubDate, 12, 4);
            $newdate1 = $month . $day . ',' . $year;         // published date formatted for comparison to today's date
            $newdate2 = $month . ' ' . $day . ', ' . $year;  // date as it will be stored in table
            
            // remove html from article description
            $description = stripHTML($item->description);
            // strip non-printing chars
            $description = preg_replace("/[^a-zA-Z0-9.\-' ']/", "", $description);
            
            // if published date is today then save article data to table.title
            if (date('Md,Y') == $newdate1) {
                $query = "INSERT INTO test (source, title, pubdate, cat, description, url) VALUES ('$source', '$newtitle', '$newdate2', '$category', '$description', '$item->link')";
                // run query
                $result = @mysql_query($query);
                if ($result) {
                    $titles++;
                    echo "<font face=verdana size=-2 color=green>$source  $newtitle</font><br>";
                } else {
                    $failed++;
                    echo "<font face=verdana size=-2 color=red>$source  $newtitle</font><br>";
                }
            }
        }
        $totalTitles = $titles;
        echo "<font color=green>$totalTitles loaded...</font><br>";
        echo "<font color=red>$failed failed...</font><br>"; 
    }
}

~pickle | Please use [ code=html ], [ code=php ], etc tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: :arrow: Posting Code in the Forums to learn how to do it too.

Re: Parse RSS feeds to MySQL

Posted: Fri Mar 28, 2008 2:12 pm
by pickle
Remove the @ from your code & you might get helpful errors.

Also, not all elements are required to be in an RSS item, so it may just be that the query isn't failing, it just has nothing to insert.

Re: Parse RSS feeds to MySQL

Posted: Fri Mar 28, 2008 2:25 pm
by csdavis0906
I tried removing the @ from my code but received the same results. I also checked the feed items that failed and they did have all the data.

Re: Parse RSS feeds to MySQL

Posted: Fri Mar 28, 2008 2:45 pm
by pickle
Then output each query, check which one's have data that wasn't inserted, & try to figure out why the query broke.

Re: Parse RSS feeds to MySQL

Posted: Fri Mar 28, 2008 4:00 pm
by csdavis0906
I've tried looking at the data and the records that are left out don't seem to have anything not common to the records that are going through.

I'm wondering if it has something to do with the interaction of the INSERT with the foreach loop - maybe the loop is skipping somehow the INSERTs that are failing??

Re: Parse RSS feeds to MySQL

Posted: Fri Mar 28, 2008 6:04 pm
by pickle
Is the XML file reasonably small? Could you post it? (with tags please)

Re: Parse RSS feeds to MySQL

Posted: Sat Mar 29, 2008 3:42 pm
by csdavis0906
Found the error! The rss feeds have a field called description - some containing apostrophes - which was causing the INSERT to crash. Used the addslashes() function to remedy.

Thanks for your interest and assistance.

Re: Parse RSS feeds to MySQL

Posted: Sat Mar 29, 2008 8:23 pm
by pickle
A better option is mysql_real_escape_string().

Glad to hear you found the problem.