Parsing blogs (grab the Title and Date)

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
samsono
Forum Newbie
Posts: 1
Joined: Fri Nov 21, 2008 11:01 pm

Parsing blogs (grab the Title and Date)

Post by samsono »

I not familar with xml and how feeds work. So what I was trying to do was create a page scraper for blogs, and get the Title and Date of the post, by looking at the layout patterns. I found with Blogger and Typepad all their pages followed the same format make it easy to crawl, but WordPress blogs where completely inconsistent from blog to blog and version to version.

What I wanted to know is it possible and if so how would I be able to enter in the url of a particular post and get the title and date of the post, if the page is RSS supported. And not simply just the most recent posts but any post between any date(2004,2005..2008).
koen.h
Forum Contributor
Posts: 268
Joined: Sat May 03, 2008 8:43 am

Re: Parsing blogs (grab the Title and Date)

Post by koen.h »

Normally the rss or atom feeds follow a strict format (some obligated elements, some optional). If you see difference in Wordpress feeds that's because they use a different version or options.

http://en.wikipedia.org/wiki/Web_feed
Post Reply