Page 1 of 1

Writing a Bot to Find Conferences

Posted: Tue May 19, 2009 8:23 pm
by akreider
I want to write a bot to find upcoming (liberal/activist) conferences (note: it'd be open source, so you could have a seperate version finding rightwing conferences if those are your politics).

I'd like to plug-in some starter websites and use those to find links to other websites. Then on each website, I want to identify its events calendar page (if there is one), and check for mentions of upcoming events on the homepage.

I'm wondering if there is an open source php bot that I could modify to do this? Or should I try to write my own?

What kind of algorithms would be useful to recognize conferences? I'm thinking of having two scores - one for "is it a conference" and another for "is it liberal". Then I can use keywords with scores to try and identify things.

Is there any kind of algorithm or source of data for relating words to each other? This goes beyond stemming as I'm looking for very loose synonyms of words like "conference" or words like "liberal" (the latter gets very broad as nobody organizes "liberal" conferences - they use 101 keywords). I can use Google's Adwords API to get some synonyms - any other sources?

I already have a database of several hundred conference descriptions, so I could use a function that compares pages that I find to existing conference descriptions and gives a similarity score. I think I'd want something more powerful than the existing php function that compares strings.

Books: I've got "Webbots, spiders and screenscrapers" and "Collective Intelligence". Any other suggested reading?

Thanks!

Aaron