I am making an English part of speech tagging using Hidden Markov Model, one of the data it require is P(Word|Tag) which means what is the probability of a word having the particular part of speech tag. E.g for the word "race"
Total number of noun = 12345
Total number of "race" as noun = 20
P(race|noun) = 20 / 12345 = 0.0016200891
In my program I will need the 0.0016200891 probability value, so my question is which is better ?
1. Store the calculated probaility value immediately in the database, or
2. Store the "total number of noun" and "total number of race as noun" then calculate the probability during the execution
Thank you.
Type of data to store
Moderator: General Moderators
- aaronhall
- DevNet Resident
- Posts: 1040
- Joined: Tue Aug 13, 2002 5:10 pm
- Location: Back in Phoenix, missing the microbrews
- Contact:
Re: Type of data to store
Your schema really shouldn't hold aggregate data along side the data you're aggregating... there's almost always a better way. You can probably cache these calculations per word if queries get too expensive, but it may not be necessary (subjective).
Re: Type of data to store
Option 2. It will be easier to modify, update data, and spot mistakes. It may take a little more time now, but it could save you a lot in the future.
--Dave
--Dave