Parsing blogs (grab the Title and Date)
Posted: Fri Nov 21, 2008 11:12 pm
I not familar with xml and how feeds work. So what I was trying to do was create a page scraper for blogs, and get the Title and Date of the post, by looking at the layout patterns. I found with Blogger and Typepad all their pages followed the same format make it easy to crawl, but WordPress blogs where completely inconsistent from blog to blog and version to version.
What I wanted to know is it possible and if so how would I be able to enter in the url of a particular post and get the title and date of the post, if the page is RSS supported. And not simply just the most recent posts but any post between any date(2004,2005..2008).
What I wanted to know is it possible and if so how would I be able to enter in the url of a particular post and get the title and date of the post, if the page is RSS supported. And not simply just the most recent posts but any post between any date(2004,2005..2008).