Cache design
Posted: Fri Mar 09, 2012 9:29 am
I'm creating a caching system which generates a cached HTML file for each unique "display loop" (basically a database query + the functions/HTML used to display the content). There is a possibility for several different loops on each page, or a loop with the same parameters on two different pages, so it's not really feasible for me to just flag the cache handler to regenerate the HTML file each time a certain item is edited, because that item could potentially be displayed in two different ways in two different queries. For example, I could have a "movie" object called "Spiderman", and it's shown in the sidebar query for recent movies, the page for browsing movies that start with "s", and the individual page for the movie.
I've thought of a few options for designing the caching system and was looking for input on which one would be best.
1) Save the last update time in the database, each time the content is referenced in a query I compare the time updated with the time the cached HTML file was generated, and generate a new one if necessary.
PRO: Avoid the "multiple loop" problem I described above
CON: Having to perform the database query to retrieve the last updated time; could defeat the purpose of having a cache
2) Save the last update time in a text file with the same sort of idea.
PRO: Avoid both database queries and the "multiple loop" problem
CON: The text file would get really big, really fast, meaning that storing this information in the database could end up better in the long run
3) Create a separate table or text file with just references to entries that need to be updated, and remove entries after the cache file is regenerated
PRO: Limit the size of a database query or text file
CON: I'd run into the "multiple loop" problem because the entry would be removed after the cache is regenerated for one specific display loop, but it might also show up in another query.
-----
Ideally I'd like to use a variation of #3 but with some way to only remove the entry if all of its related cache files have been updated. The problem with that is this application is deployed on several different environments with several different data structures, and it's impossible to know which display loops are being used without parsing the template files for the parameters of each initQuery() function (the loops are defined and run in the template files, similar to the wp_query function in Wordpress if anyone is familiar). So unless I somehow inform the application of each existing display loop and its parameters, I can't really have a check to see if every cache file for a loop that includes a certain item has been updated.
Any ideas/thoughts/suggestions?
Thanks
I've thought of a few options for designing the caching system and was looking for input on which one would be best.
1) Save the last update time in the database, each time the content is referenced in a query I compare the time updated with the time the cached HTML file was generated, and generate a new one if necessary.
PRO: Avoid the "multiple loop" problem I described above
CON: Having to perform the database query to retrieve the last updated time; could defeat the purpose of having a cache
2) Save the last update time in a text file with the same sort of idea.
PRO: Avoid both database queries and the "multiple loop" problem
CON: The text file would get really big, really fast, meaning that storing this information in the database could end up better in the long run
3) Create a separate table or text file with just references to entries that need to be updated, and remove entries after the cache file is regenerated
PRO: Limit the size of a database query or text file
CON: I'd run into the "multiple loop" problem because the entry would be removed after the cache is regenerated for one specific display loop, but it might also show up in another query.
-----
Ideally I'd like to use a variation of #3 but with some way to only remove the entry if all of its related cache files have been updated. The problem with that is this application is deployed on several different environments with several different data structures, and it's impossible to know which display loops are being used without parsing the template files for the parameters of each initQuery() function (the loops are defined and run in the template files, similar to the wp_query function in Wordpress if anyone is familiar). So unless I somehow inform the application of each existing display loop and its parameters, I can't really have a check to see if every cache file for a loop that includes a certain item has been updated.
Any ideas/thoughts/suggestions?
Thanks