Page 1 of 1

HTTP search class

Posted: Mon Jan 30, 2006 12:07 am
by alex.barylski
Need a class which returns a list of URL's, alongside relevance, etc...

The catch is...I need it to work like a "real" search engine...starting at a specified directory or file and scanning all crawling the site, excluding files & directories specified in a config file...

And no, Google API won't work...

Also needs to be implemented in strictly PHP...and can't rely on MySQL FULLTEXT search...

PHPDig doesn't sound like a very good option...for one...cuz it returns a formatted list of results...as opposed to a generic array which I can then use and format the results accordingly....and it seems to reply on MySQL heavily...I dunno what they mean by Flat file support....cuz I couldn't find anything in the docs about how to use that instead of SQL...

In anycase...am I dreaming??? Or is somehting available??? :)

Cheers :)

Posted: Mon Jan 30, 2006 12:49 am
by josh
You will need some sort of database with indexing if you don't want horrible performance. If listing documents in relevance based on keyword occurrence is what you want it shouldn't be too hard.

Just set up a table that has the id of the page, the name of the keyword, and the number of occurrences. when you need to hit that you just select the distinct pages, along with the number of occurrences... You can go out and make another query to grab the content of the pages it fetched so you can put a 1 paragraph excerpt of the page.

Do you need help with the crawler, the script that would index it or the search script?


Also can you use mysql boolean searches? Out of curiosity why is fulltext ruled out?