Page 1 of 1
Bayesian Filters
Posted: Fri Jul 14, 2006 12:33 pm
by R4000
Hello guys, a friend of mine asked me a question earlier, about a project of his for school.
It involved bayseian filters, now i have no idea what one of these is, or howto impliment it in PHP. Any of you guys know?
What he wants it for is something like this:
1) Find uri from company name [DONE]
2) spider the site and identify the key pages (about us ect.) [DONE]
3) identify the key parts of the site (company desc, each person in the company ect.)
4) Return the data in a structured array
any ideas how he could do 3 and 4?
Posted: Fri Jul 14, 2006 12:48 pm
by Ward
I can't think of any easy way. It sounds like lots of AI, document weights, etc. Basically an intelligent spider that can 'understand' a site.
Posted: Fri Jul 14, 2006 12:52 pm
by R4000
Yea, im pretty sure its just one of these impossable tasks you get given, that you can't actualy do.
But they want to see how inteligent you are by see what you would do to try doing it.
So still got any theorys on how to impliment it?

Posted: Fri Jul 14, 2006 1:19 pm
by Chris Corbyn
Yep, Bayes is supposed to "learn". Think of things like an email clinet where you keep marking emails as junk.... the email client may start to figure out what you classify as "junk" and help you out a little.
SpamAssassin implements a Bayes Auto-learn feature.
Posted: Fri Jul 14, 2006 1:33 pm
by R4000
So is it posible to us it in this project do you think?
Posted: Fri Jul 14, 2006 1:57 pm
by Chris Corbyn
R4000 wrote:So is it posible to us it in this project do you think?
There are web page analyzers that use the same technique. You can pass your newly finished web page through it and it will say "we think your site is about <description here>, your company is called <name> and .... blah blah blah". My brother has used them before to see if they came up with what you'd expect.
I'll see if I can find URLs although it's backend coding you need to look at but yes it's certainly doable. It's not an easy task though. You'll need to parse elements of the page I imagine and even then you're hoping that the page will be well-formed.
Posted: Fri Jul 14, 2006 2:01 pm
by R4000
ok
those links would be nice.
thankyou!