Page 1 of 1

Type of data to store

Posted: Tue Mar 11, 2008 8:28 pm
by keenlearner
I am making an English part of speech tagging using Hidden Markov Model, one of the data it require is P(Word|Tag) which means what is the probability of a word having the particular part of speech tag. E.g for the word "race"
Total number of noun = 12345
Total number of "race" as noun = 20

P(race|noun) = 20 / 12345 = 0.0016200891

In my program I will need the 0.0016200891 probability value, so my question is which is better ?
1. Store the calculated probaility value immediately in the database, or
2. Store the "total number of noun" and "total number of race as noun" then calculate the probability during the execution


Thank you.

Re: Type of data to store

Posted: Fri Mar 14, 2008 2:17 am
by aaronhall
Your schema really shouldn't hold aggregate data along side the data you're aggregating... there's almost always a better way. You can probably cache these calculations per word if queries get too expensive, but it may not be necessary (subjective).

Re: Type of data to store

Posted: Wed Mar 26, 2008 4:57 pm
by dhampson
Option 2. It will be easier to modify, update data, and spot mistakes. It may take a little more time now, but it could save you a lot in the future.

--Dave