Parse RSS feeds to MySQL

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
csdavis0906
Forum Newbie
Posts: 17
Joined: Thu Mar 27, 2008 9:36 am

Parse RSS feeds to MySQL

Post by csdavis0906 »

~pickle | Please use [ code=html ], [ code=php ], etc tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: :arrow: Posting Code in the Forums to learn how to do it too.


I am using the code below to parse a series of rss feeds into MySQL with each record storing the data for a single rss item including title, description, link... In addition, the titles and descriptions are stripped of whitespace and html tags.

The problem I'm having is that 10% or so of my INSERTS are failing and not stored to the db?? If anyone has some insights or recommendations, please let me know. Thanking you in advance...

Code: Select all

function parseRSS() {
    
    // load xml of rss feeds 
    if (!$myxml = simplexml_load_file('rss.xml')){
        echo 'Error reading the XML file';
    }
    $titles = 0;  // initialize var to count total number of articles loaded
    $failed = 0;  // initialize var to count failed inserts
    
    // process each feed
    foreach($myxml as $feed){
        
        $source   = $feed->id;                    // news source id 
        $category = $feed->cat;                 // news category 
        $feed     = 'xml/' . $feed->id . '.xml';   // name locally saved file
        
        if (!$feedxml = simplexml_load_file($feed)){ 
           echo 'Error reading the RSS feed';
        }
        // loop through each feed by article
        foreach($feedxml->xpath('//item') as $item) {
            
            // remove html from article description
            $newtitle = stripHTML($item->title);
            // strip title of non-printing char's
            $newtitle = preg_replace("/[^a-zA-Z0-9.\-' ']/", "", $newtitle);
            // trim leading whitespaces
            $newtitle = ltrim($newtitle);
            
            // parse published date/time
            $month    = substr($item->pubDate, 8, 3);
            $day      = substr($item->pubDate, 5, 2);
            $year     = substr($item->pubDate, 12, 4);
            $newdate1 = $month . $day . ',' . $year;         // published date formatted for comparison to today's date
            $newdate2 = $month . ' ' . $day . ', ' . $year;  // date as it will be stored in table
            
            // remove html from article description
            $description = stripHTML($item->description);
            // strip non-printing chars
            $description = preg_replace("/[^a-zA-Z0-9.\-' ']/", "", $description);
            
            // if published date is today then save article data to table.title
            if (date('Md,Y') == $newdate1) {
                $query = "INSERT INTO test (source, title, pubdate, cat, description, url) VALUES ('$source', '$newtitle', '$newdate2', '$category', '$description', '$item->link')";
                // run query
                $result = @mysql_query($query);
                if ($result) {
                    $titles++;
                    echo "<font face=verdana size=-2 color=green>$source  $newtitle</font><br>";
                } else {
                    $failed++;
                    echo "<font face=verdana size=-2 color=red>$source  $newtitle</font><br>";
                }
            }
        }
        $totalTitles = $titles;
        echo "<font color=green>$totalTitles loaded...</font><br>";
        echo "<font color=red>$failed failed...</font><br>"; 
    }
}

~pickle | Please use [ code=html ], [ code=php ], etc tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: :arrow: Posting Code in the Forums to learn how to do it too.
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Re: Parse RSS feeds to MySQL

Post by pickle »

Remove the @ from your code & you might get helpful errors.

Also, not all elements are required to be in an RSS item, so it may just be that the query isn't failing, it just has nothing to insert.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
csdavis0906
Forum Newbie
Posts: 17
Joined: Thu Mar 27, 2008 9:36 am

Re: Parse RSS feeds to MySQL

Post by csdavis0906 »

I tried removing the @ from my code but received the same results. I also checked the feed items that failed and they did have all the data.
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Re: Parse RSS feeds to MySQL

Post by pickle »

Then output each query, check which one's have data that wasn't inserted, & try to figure out why the query broke.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
csdavis0906
Forum Newbie
Posts: 17
Joined: Thu Mar 27, 2008 9:36 am

Re: Parse RSS feeds to MySQL

Post by csdavis0906 »

I've tried looking at the data and the records that are left out don't seem to have anything not common to the records that are going through.

I'm wondering if it has something to do with the interaction of the INSERT with the foreach loop - maybe the loop is skipping somehow the INSERTs that are failing??
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Re: Parse RSS feeds to MySQL

Post by pickle »

Is the XML file reasonably small? Could you post it? (with tags please)
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
csdavis0906
Forum Newbie
Posts: 17
Joined: Thu Mar 27, 2008 9:36 am

Re: Parse RSS feeds to MySQL

Post by csdavis0906 »

Found the error! The rss feeds have a field called description - some containing apostrophes - which was causing the INSERT to crash. Used the addslashes() function to remedy.

Thanks for your interest and assistance.
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Re: Parse RSS feeds to MySQL

Post by pickle »

A better option is mysql_real_escape_string().

Glad to hear you found the problem.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
Post Reply