Page 1 of 1

Ranking different text pages, given some queries

Posted: Sun Jan 31, 2010 4:16 pm
by soorena776
Hello everyone!

I want your help in the following problem:

I have some text files in my site, each corresponds to an html page. I'm getting some query words from user in the main page, and I have to find the most relevant text file regarding the query words, and consequently its corresponding webpage.

I was wondering how can I do this in php? Is there any tools/engines for this purpose?
Again, I want to rank a limited number of text patterns based on a given query


Thank you all

--
Soorena

Re: Ranking different text pages, given some queries

Posted: Sun Jan 31, 2010 5:56 pm
by AbraCadaver
This is very vague, but in general loop through the text files, read them into a string and preg_match_all() on your search term(s) and rank based on the number of matches. That's all I can say without knowing anything about the names of the files and how they relate to the HTML files and what the text files look like.

Re: Ranking different text pages, given some queries

Posted: Sun Jan 31, 2010 9:59 pm
by josh
Just 2 tools that could help
http://www.google.com/search?q=mysql+fu ... =firefox-a
http://www.google.com/search?hl=en&clie ... g-s1g6&oq=

Or if you can get your documents indexed, google is the best option.

Wikipedia uses mysql full text. When load hits a certain level it automatically sets off a "breaker" that flips their search to use google custom site search until load goes back down.

Re: Ranking different text pages, given some queries

Posted: Sun Jan 31, 2010 10:10 pm
by soorena776
AbraCadaver wrote:This is very vague, but in general loop through the text files, read them into a string and preg_match_all() on your search term(s) and rank based on the number of matches. That's all I can say without knowing anything about the names of the files and how they relate to the HTML files and what the text files look like.
Thanks for the hint. This was very useful, but there are some additional requirements. See the other comments below if you like.

S

Re: Ranking different text pages, given some queries

Posted: Sun Jan 31, 2010 10:18 pm
by soorena776
josh wrote:Just 2 tools that could help
http://www.google.com/search?q=mysql+fu ... =firefox-a
http://www.google.com/search?hl=en&clie ... g-s1g6&oq=

Or if you can get your documents indexed, google is the best option.

Wikipedia uses mysql full text. When load hits a certain level it automatically sets off a "breaker" that flips their search to use google custom site search until load goes back down.

Thank you so much pal. The problem with google is that not all portions of my website are unique urls, but some kind of scroll containing different pictures/slides in the same page. What I want to do, is to extract the queries entered by user in Google's referral link, and return the most appropriate picture/slide/text based on these queries, on a dynamic page.

I was wondering if these search tools (say MySQL's) support basic semantic search(like adding or removing the plural s, or synonyms or the words distance) features, or do just exact matching.

I would appreciate any help in this regard. If my description is vague, let me know

S

Re: Ranking different text pages, given some queries

Posted: Tue Feb 02, 2010 10:17 pm
by josh
Mysql or Lucene are very google like. They are a lot more advanced then just stemming words (running and runner are seen as related keywords - thats called stemming)

Re: Ranking different text pages, given some queries

Posted: Tue Feb 02, 2010 10:22 pm
by soorena776
Thank you so much, you helped me a lot!
Good luck