Cache large e-commerce site
Moderator: General Moderators
-
astherushcomes
- Forum Newbie
- Posts: 4
- Joined: Tue Jan 03, 2012 5:15 am
Cache large e-commerce site
Hello,
I manage a big e-commerce site in a custom made PHP e-commerce system. We have more than 13000 different products, each has a dynamic product page with pretty URL (domain/category/postname-sku). I am afraid that there are very many MySQL queries run every day, which makes the site slow.
1. Static HTML pages with directories inside root, directly access by web server and not through PHP. Doing some beta testing this seems to be the fastest, yet there are very many files and it takes time to re-index all 13000 product pages.
Root
Category
index.html
product1.html
Subcategory
index.html
2. PHP fetching HTML from a static HTML pages outside root not accessible from the web server directly
Root
Cache
Category
index.html
product1.html
Subcategory
index.html
And use require_once/include_once/require/include to fetch the right HTML file. This is somewhat faster than the dynamic approach we have today, yet it takes long time to re-index.
What are your suggestions? Perhaps you can think of more than the choices above?
Regards
Mikael
I manage a big e-commerce site in a custom made PHP e-commerce system. We have more than 13000 different products, each has a dynamic product page with pretty URL (domain/category/postname-sku). I am afraid that there are very many MySQL queries run every day, which makes the site slow.
1. Static HTML pages with directories inside root, directly access by web server and not through PHP. Doing some beta testing this seems to be the fastest, yet there are very many files and it takes time to re-index all 13000 product pages.
Root
Category
index.html
product1.html
Subcategory
index.html
2. PHP fetching HTML from a static HTML pages outside root not accessible from the web server directly
Root
Cache
Category
index.html
product1.html
Subcategory
index.html
And use require_once/include_once/require/include to fetch the right HTML file. This is somewhat faster than the dynamic approach we have today, yet it takes long time to re-index.
What are your suggestions? Perhaps you can think of more than the choices above?
Regards
Mikael
Re: Cache large e-commerce site
While I'm not following what you are giving with the 2 numbered items above, I have had to deal with a larger site with high traffic (44k unique visitors a month. which for my experience was high, I know others deal with more). The code it was built upon was quite wasteful of resources and not tested for high volume. (it was a new framework where I worked, so between working out performance issues and not being familiar with how to best use the code that was developed, it needed help).
The framework we used utilized Smarty, however, we (the developers at the company) were new to smarty and the proper way to use the caching available on it. So I sat down and figured out how to manage it better with my own system.
Key things to factor out:
--What parts of the site HAVE to be dynamic (ie, code that shows your current shopping cart, links to "Login / My Account" type of things), these will need to be handled outside of cached portions.
--What parts of the page are affected by other aspects of the site (ie. a Product category page listing the products. If you modify/add/remove a product listed on that page, that whole page needs rebuilt)
--What chunks of cached information can be cached separately (ie, do you have a main navigation that goes levels deep that is on more than one page) Not only did I cache finaly page output, I would also cache results or data (ie an array of all page links with their names)
Now, here is the route I was taking with my caching, there was a directory that contained all the cached data. If you went to say http://example.com/product-my-widget It would look in the cache directory, if the file existed and it was within certain time frame (i like having them rebuilt after like a week) it will use the cached page, otherwise it would generate the page and save it out to cached page. I usually save the cache as a URL encoded request, (same as you would see in log files, GET(or POST) /path/path/file?query=string. This makes sure that they are all unique.
Now, back to the things that have to stay dynamic. In the above generating and caching, dynamic data is not in there, instead there is code that triggers functions to produce the dynamic data. These are done using the following format, which is function name followed by parameters passed to it (saved in Query string style, using urlencode):
so then we do a preg_match_all to find these (this looks really nasty since the [ ] and | have special meaning in regex and have to get escaped)
This code will find all instances of placeholders, and if the function exists (functions named: dyncode_function_name) and process it and store it in an array. This way you can use it more than once on the page, but it only processes it once. If it can't find it, it will replace it with an error.
Here is sample code for the functions above. For demonstration purposes, the cart one gets passed the pageid, so you can do something like not make it clickable when you are actually on the cart page.
So, now you have all that, and it is working, there is the issue of what to do when you change something. You mention that it takes a long time to "reindex" everything. I assume by this you are meaning rebuilding all the cached pages. Well here is the thing, there is no need to do that with the way I mentioned, as the system will make the cached copy the first time the page is called and a cached copy doesn't exist (or the cached copy is over week old).
So now, how do you determine what gets deleted when it comes to clearing out the cache. Easiest way, is just wipe all cache. If you are not editing pages/products much at all, just set it so that in your admin any time a _POST occurs on an editor page, wipe the directory. Otherwise, place it as a command in your admin to clear the directory.
Doing this majorly sped up the site I was working on. Unfortunately, the client didn't want to pay for it, and bossed didn't want me "wasting time" developing it. I may be a geek and write it all for my own experience, but company won't let me change the site for free, and needed permission to change a live site that large. Again, also not only did I cache output, I would cache data queries as well, so anywhere I needed the main nav, it read in the array from a file (it was stored serialized). A lot better than (in this case) several hundred queries.
Hope this information helps.
-Greg
The framework we used utilized Smarty, however, we (the developers at the company) were new to smarty and the proper way to use the caching available on it. So I sat down and figured out how to manage it better with my own system.
Key things to factor out:
--What parts of the site HAVE to be dynamic (ie, code that shows your current shopping cart, links to "Login / My Account" type of things), these will need to be handled outside of cached portions.
--What parts of the page are affected by other aspects of the site (ie. a Product category page listing the products. If you modify/add/remove a product listed on that page, that whole page needs rebuilt)
--What chunks of cached information can be cached separately (ie, do you have a main navigation that goes levels deep that is on more than one page) Not only did I cache finaly page output, I would also cache results or data (ie an array of all page links with their names)
Now, here is the route I was taking with my caching, there was a directory that contained all the cached data. If you went to say http://example.com/product-my-widget It would look in the cache directory, if the file existed and it was within certain time frame (i like having them rebuilt after like a week) it will use the cached page, otherwise it would generate the page and save it out to cached page. I usually save the cache as a URL encoded request, (same as you would see in log files, GET(or POST) /path/path/file?query=string. This makes sure that they are all unique.
Now, back to the things that have to stay dynamic. In the above generating and caching, dynamic data is not in there, instead there is code that triggers functions to produce the dynamic data. These are done using the following format, which is function name followed by parameters passed to it (saved in Query string style, using urlencode):
Code: Select all
<div id="account-info">[[login_info]]</div>
<div id="cart-info">[[cart_items|pageid=44]]</div>so then we do a preg_match_all to find these (this looks really nasty since the [ ] and | have special meaning in regex and have to get escaped)
Code: Select all
if (preg_match_all('/\[\[([^|\]]+)(\|([^\]]+))?\]\]/',$strPage,$dynamic)) {
$aryDynData = array(); // Contains dynamic output
// Load up the dynamic information
foreach($dynamic[0] as $index=>$key) {
if (!isset($aryDynData[$key])) {
// wasn't already used this page call, use it
$fnName = 'dyncode_'.$dynamic[1][$index];
if (function_exists($fnName)) {
$aryDynData[$key] = $$fnName(parse_str($dynamic[1][$index]));
}
else {
$aryDynData[$key] = '[ERROR: Could not find function '.$dynamic[1][$index].']';
}
}
}
// Put dynamic info into page:
foreach($aryDynData as $key=>$val) {
str_replace($key,$val,$strPage);
}
}Here is sample code for the functions above. For demonstration purposes, the cart one gets passed the pageid, so you can do something like not make it clickable when you are actually on the cart page.
Code: Select all
function dyncode_login_info($params) {
if (isset($_SESSION['user'])) {
return 'Hello '.htmlspecialchars($_SESSION['user']['first_name'],ENT_QUOTES).'!';
}
else {
return '<a href="/login">Login</a>';
}
}
function dyncode_cart_items($params) {
$intPageID = (isset($params['pageid'])) ? (int)$params['pageid'] : 0;
$strCartInfo = '2 Items ($45.90)'; // Code to caclulate cart info
if (in_array($intPageID,array(44,122,244))) {
// This page is one that should only display the cart info, not link to cart
// (example, you are on the cart page or a checkout page)
return $strCartInfo;
}
else {
// All other pages, make this a link to the cart
return '<a href="/cart">'.$strCartInfo.'</a>';
}
}
So now, how do you determine what gets deleted when it comes to clearing out the cache. Easiest way, is just wipe all cache. If you are not editing pages/products much at all, just set it so that in your admin any time a _POST occurs on an editor page, wipe the directory. Otherwise, place it as a command in your admin to clear the directory.
Doing this majorly sped up the site I was working on. Unfortunately, the client didn't want to pay for it, and bossed didn't want me "wasting time" developing it. I may be a geek and write it all for my own experience, but company won't let me change the site for free, and needed permission to change a live site that large. Again, also not only did I cache output, I would cache data queries as well, so anywhere I needed the main nav, it read in the array from a file (it was stored serialized). A lot better than (in this case) several hundred queries.
Hope this information helps.
-Greg
Last edited by twinedev on Sat Jan 07, 2012 6:12 pm, edited 1 time in total.
-
astherushcomes
- Forum Newbie
- Posts: 4
- Joined: Tue Jan 03, 2012 5:15 am
Re: Cache large e-commerce site
Hello Greg, and thanks for your reply.
We have a div on top of all pages with dynamic content for the user's hopping cart. I solved this by making it an image with GD, thus I can cache the whole page. I think this is quite an elegant solution, what do you think?
The further issue we have is the categories menu. The list of categories is a dynamic MySQL query with ".. GROUP BY category ORDER BY category ASC". So if a product with a new category is added, all of the pages changes. If caching them to static HTML pages, every page will have to be re-cached. You say you store the category in a serialized string on file, but how do you include that in the cache file? From my benchmark tests, the speed improvement when doing include/require/include_once/require_once inside PHP wasn't that big compared to a static HTML-file fetched directly from the web server.
We have a div on top of all pages with dynamic content for the user's hopping cart. I solved this by making it an image with GD, thus I can cache the whole page. I think this is quite an elegant solution, what do you think?
The further issue we have is the categories menu. The list of categories is a dynamic MySQL query with ".. GROUP BY category ORDER BY category ASC". So if a product with a new category is added, all of the pages changes. If caching them to static HTML pages, every page will have to be re-cached. You say you store the category in a serialized string on file, but how do you include that in the cache file? From my benchmark tests, the speed improvement when doing include/require/include_once/require_once inside PHP wasn't that big compared to a static HTML-file fetched directly from the web server.
Re: Cache large e-commerce site
Well if hte goal is to speed up things and use as little resources as possible, having generate an image each time the cart changes seems much to me, but I will admit, I don't use GD, so not sure what that eats up.
The performance of require/include vs feeding a direct .html file generated is minimal compared to either of those compared to many database calls.
As for how to store and use data, here are the two main functions for handling that which I use:
Then in where I need to use data:
(now for things I write, I don't separate out the Status to a separate table, that is just an example of how over-killed data was broken out where I worked)
As another note, where I was working when I was dealing with this, they use separate database server from web server, so reducing database calls were a real big issue in not only site preformance, but reducing load on the SQL server and the connection to it. (it served several hundred sites, many of while were high traffic).
The performance of require/include vs feeding a direct .html file generated is minimal compared to either of those compared to many database calls.
As for how to store and use data, here are the two main functions for handling that which I use:
Code: Select all
define ('NO_CACHE', '[~~NO~~CACHE~~]'); // Something that would definitely not be some data you'd use
define ('CACHE_LIFETIME', 604800); // 7 days ( 7 * 24 * 60 * 60 )
define ('CACHE_DIR', realpath($_SERVER['DOCUMENT_ROOT'].'/../db_cache').'/');
/**
* Checks to see if Cache exists, if so, returns is, otherwise, returns NO_CACHE constant
*
* @param string $strName The name of the cache info (usually the SQL statement)
* @param string $strSuffix Optional suffix to add to cache filename
* @return mixed Either the data from the cache file or the NO_CACHE constant
*/
function cacheCheck($strName,$strSuffix='') {
$strCacheFile = md5($strName).'_'.md5(base64_encode($strName)).preg_replace('/[^-_0-9a-z]/i','',$strSuffix);
if (!DEBUG_MODE && file_exists(CACHE_DIR.$strCacheFile) && filemtime(CACHE_DIR.$strCacheFile)+CACHE_LIFETIME >= time()) {
$aryCache = file(CACHE_DIR.$strCacheFile);
if ($aryCache && $strName."\n" == $aryCache[0]) {
// It found a file and a matched query
array_shift($aryCache); // Gets rid of CacheName
array_shift($aryCache); // Gets rid of blank separator line
return unserialize(implode('',$aryCache));
}
}
return NO_CACHE;
}
/**
* Writes data out to cache file.
*
* @param array $aryData The data to write
* @param string $strName The name of the cache info (usually the SQL statement)
* @param string $strSuffix Optional suffix to add to cache filename
*/
function cacheWrite($aryData,$strName,$strSuffix='') {
$strCacheFile = md5($strName).'_'.md5(base64_encode($strName)).preg_replace('/[^-_0-9a-z]/i','',$strSuffix);
$fp = fopen(CACHE_DIR.$strCacheFile,'w');
fwrite($fp,$strName."\n\n");
fwrite($fp,serialize($aryData));
fclose($fp);
unset($fp);
}Code: Select all
$SQL = 'SELECT pg.`PageID`, pt.`PathName`, pg.`LinkText` FORM `tblPage` AS pg ';
$SQL .= 'LEFT JOIN `tblPath` AS pt ON pt.`PrimaryKey`=pg.`PageID` ';
$SQL .= 'LEFT JOIN `tblStatus` AS st ON st.`StatusID`=pg.`StatusID` ';
$SQL .= 'WHERE pg.`ParentID` = '.$intParentID.' AND st.`Status`="Active" ';
$SQL .= 'ORDER BY pg.`DisplayOrder` ';
$aryBranch = cacheCheck($SQL);
if ($aryBranch==NO_CACHE) {
// Actually execute SQL statement(s) and assign to $aryBranch
cacheWrite($aryBranch,$SQL);
}As another note, where I was working when I was dealing with this, they use separate database server from web server, so reducing database calls were a real big issue in not only site preformance, but reducing load on the SQL server and the connection to it. (it served several hundred sites, many of while were high traffic).
-
astherushcomes
- Forum Newbie
- Posts: 4
- Joined: Tue Jan 03, 2012 5:15 am
Re: Cache large e-commerce site
You say that the improvement of static HTML versus require/include is rather small. But when I run ab benchmark (apaches own) I get the following stats;
- direct HTML fetched right from web server: 680 requests per seconds
- cached HTML with include/require from PHP script: 1 request per second
- dynamic PHP page: 0.84 requests per second.
That's some huge improvement?
Regards
Mikael
- direct HTML fetched right from web server: 680 requests per seconds
- cached HTML with include/require from PHP script: 1 request per second
- dynamic PHP page: 0.84 requests per second.
That's some huge improvement?
Regards
Mikael
Re: Cache large e-commerce site
I see your point, never realized how much php slowed repsonces. Just tried it on my server:
Direct WP home page call:
The output of that saved as an .html file:
A PHP file that does nothing but reads the above .html file and echos it out:
Now you got me rethinking things on just doing flat out static pages when possible.
You can still do it with static files and have it auto generate on call. Write a custom 404 script, so if I go to /products/widget-455.html and it doesn't exist, it calls the 404 script, which then checks to see if there should be a page there, and if so, then it creates it on the fly, writes it to the actual /products/widget-455.html, and also feeds it out the visitor. And to please my personal tastes to clear them out after a week, I would write to a DB the path and the timestamp. then have a cron run say every 12 hours to delete files created over a week prior.
-Greg
Direct WP home page call:
Code: Select all
Requests per second: 13.93 [#/sec] (mean)
Time per request: 358.987 [ms] (mean)
Time per request: 71.797 [ms] (mean, across all concurrent requests)Code: Select all
Requests per second: 1598.66 [#/sec] (mean)
Time per request: 3.128 [ms] (mean)
Time per request: 0.626 [ms] (mean, across all concurrent requests)Code: Select all
Requests per second: 63.40 [#/sec] (mean)
Time per request: 78.869 [ms] (mean)
Time per request: 15.774 [ms] (mean, across all concurrent requests)You can still do it with static files and have it auto generate on call. Write a custom 404 script, so if I go to /products/widget-455.html and it doesn't exist, it calls the 404 script, which then checks to see if there should be a page there, and if so, then it creates it on the fly, writes it to the actual /products/widget-455.html, and also feeds it out the visitor. And to please my personal tastes to clear them out after a week, I would write to a DB the path and the timestamp. then have a cron run say every 12 hours to delete files created over a week prior.
-Greg
-
astherushcomes
- Forum Newbie
- Posts: 4
- Joined: Tue Jan 03, 2012 5:15 am
Re: Cache large e-commerce site
Yes, that is what I am thinking about doing!
The remaining issue is the shopping cart (dynamic section) where I have replaced it with a PHP-image (PHP reads the cookie, GD writes the number of products inside cart).
Also I am somewhat afraid of caching files in the root. I have all the other files there, and it's import to think about security designing this. I have some folders in the root that is essential for the e-commerce system. I could solve this by doing this;
if(in_array($name, $exclude)==FALSE)
{
## delete file
## re-cache
}
The remaining issue is the shopping cart (dynamic section) where I have replaced it with a PHP-image (PHP reads the cookie, GD writes the number of products inside cart).
Also I am somewhat afraid of caching files in the root. I have all the other files there, and it's import to think about security designing this. I have some folders in the root that is essential for the e-commerce system. I could solve this by doing this;
if(in_array($name, $exclude)==FALSE)
{
## delete file
## re-cache
}
Re: Cache large e-commerce site
I want to play around more later and see what affect ModRewrite has on this, as I also would be worried about the script writing to the root, so if you could do:
/public_html/ (root regular site)
/public_html/cache/ (all generated cache files)
then have modrewrite do something like:
http://www.example.com/prodcuts/widget-34343.html
actually feeds out:
http://www.example.com/cache/products/widget-34343.html
I came from an evironment where I worked, where all php was run by user Apache, so you had to specifically make a directory writable to that user for PHP to write in it. However, get a hack script on one customers site, and it could write to any writeable directory across the whole server (been there, done that, tracked that, cleaned that, not fun). My own server I run has cPanel, which has PHP run as the user account the site is in, so nothing I write on user johndoe can write to anything on janedoes sites. But still I worry also about security and making sure nothing can write to a file that is designed to be linked to.
-Greg
Anyhow, I have seen the how dangerous it is for "shell" hack script to mess with a server, so
/public_html/ (root regular site)
/public_html/cache/ (all generated cache files)
then have modrewrite do something like:
http://www.example.com/prodcuts/widget-34343.html
actually feeds out:
http://www.example.com/cache/products/widget-34343.html
I came from an evironment where I worked, where all php was run by user Apache, so you had to specifically make a directory writable to that user for PHP to write in it. However, get a hack script on one customers site, and it could write to any writeable directory across the whole server (been there, done that, tracked that, cleaned that, not fun). My own server I run has cPanel, which has PHP run as the user account the site is in, so nothing I write on user johndoe can write to anything on janedoes sites. But still I worry also about security and making sure nothing can write to a file that is designed to be linked to.
-Greg
Anyhow, I have seen the how dangerous it is for "shell" hack script to mess with a server, so