Weird Fulltext Search Results

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
smart1
Forum Newbie
Posts: 2
Joined: Sat Feb 15, 2003 11:00 pm
Location: Austin
Contact:

Weird Fulltext Search Results

Post by smart1 »

I run this mysql_query:
Search: SELECT files.fileID, files.title, UNIX_timestamp(files.date), users.name, users.email, doctypes.doctype, argtypes.argtype, files.description, files.location, files.downloads FROM users, files, argtypes, doctypes WHERE users.userID = files.userID && doctypes.docID = files.docID && argtypes.argID = files.argID && (MATCH (files.title, files.keywords, files.description) AGAINST ("collaborative"))

If I have one record with the word collaborative in the "keyword" field, it comes up fine... BUT, if I have two or more records with the word collaborative, it doesn't pull up either of them. What gives?
User avatar
Stoker
Forum Regular
Posts: 782
Joined: Thu Jan 23, 2003 9:45 pm
Location: SWNY
Contact:

Post by Stoker »

its a bit easier to read a query if you dont throw it all in one line, and I would use AND instead of &&..

Code: Select all

SELECT files.fileID, files.title, UNIX_timestamp(files.date), users.name, users.email, doctypes.doctype, argtypes.argtype, files.description, files.location, files.downloads 
  FROM users, files, argtypes, doctypes 
  WHERE users.userID = files.userID 
   AND doctypes.docID = files.docID
   AND argtypes.argID = files.argID 
   AND (MATCH (files.title, files.keywords, files.description) AGAINST ("collaborative"))
Did you try it on the files table without any joins? perhaps one of the conditions in all the WHERE conditions arent met?
sjon
Forum Newbie
Posts: 6
Joined: Thu Jun 26, 2003 5:11 pm
Location: EU.NL

Post by sjon »

Don't do fulltext searches with one word... !!!
If collaborative is found quite a few times, it's regarded a stopword.

From the manual:
Every correct word in the collection and in the query is weighted according to its significance in the query or collection. This way, a word that is present in many documents will have lower weight (and may even have a zero weight), because it has lower semantic value in this particular collection. Otherwise, if the word is rare, it will receive a higher weight. The weights of the words are then combined to compute the relevance of the row.

Such a technique works best with large collections (in fact, it was carefully tuned this way). For very small tables, word distribution does not reflect adequately their semantic value, and this model may sometimes produce bizarre results.

mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('MySQL');
Empty set (0.00 sec)

The search for the word MySQL produces no results in the above example, because that word is present in more than half the rows. As such, it is effectively treated as a stopword (that is, a word with zero semantic value). This is the most desirable behaviour -- a natural language query should not return every second row from a 1 GB table.

If you're only using one word, it's better to use LIKE.
Or use more words :)

Code: Select all

<snip>
AND (MATCH (files.title, files.keywords, files.description) AGAINST ("collaborative php libraries"))

Sjon.
Post Reply