Background
In Managed Cloud, we often use Varnish or Nginx or Cloud Load Balancers to cache content so that customers with high traffic and/or low code quality can keep their servers from falling over under the load they are under. These techniques are well proven as workable solutions. However, each one introduces another layer of software to the content delivery chain that must now be managed, SSL becomes much more complicated proposition, and introducing a proxy into the http/https request stream can cause serious issues with some applications that must then be overcome. We can avoid those complications almost entirely by leveraging Apache’s native mod_cache as our caching engine instead. This gets us the same effect – full page caching to offload page rendering processing time – without adding anything additional to the content delivery chain. Also, when using the disk based cache scheme as documented below, cache will survive a reboot and Apache restarts. Furthermore, since the cache config is per VirtualHost, it makes it extremely easy to have different cache configs for each Vhost (including ‘none’ for sites which have trouble with caching).
Notes: I think mod_cache should not be viewed as a replacement for Varnish/Nginx/etc – but rather as another tool in our arsenal of tricks we can use where/when appropriate to solve real world customer issues. Such cases would be when those other solutions create issues due to customer web-apps poor behavior.
More Notes: This is the first draft of this document – further updates should include instructions for CentOS/RHEL, as well as instructions on how to use the disk based cache in /dev/shm for performance + persistence. It could also use more serious review of the directive options used below – they worked great in the test case, but may not be appropriate in all cases. If you are reading this note and not those docs, and feel like documenting them yourself, feel free, it’s a wiki after all.
Instructions
Setting up mod_cache is a fairly straightforward process. On Ubuntu/Debian/derivatives, the following will enable the necessary modules:
# enable the main module
a2enmod cache
# enable the sub-module(s) you wish to use (only one required)
a2enmod disk_cache
a2enmod mem_cache
a2enmod file_cache
# and restart apache
service apache2 restart
# make sure htcacheclean starts with apache now
ps ax | grep htcacheclean
# if you don’t see /usr/sbin/htcacheclean running, then this:
/usr/sbin/htcacheclean -n -d120 -i -p/var/cache/apache2/mod_disk_cache -l300M
# and make sure it starts on reboot by putting the command above in /etc/rc.local above the ‘exit’
Then, you need to put a block of code that looks something like the following into the customer’s VirtualHost(s) file(s) to enable it for that Vhost.
<IfModule mod_cache.c>
<IfModule mod_disk_cache.c>
# enable caching for the uri “/”
CacheEnable disk /
# path to store the cache files (this is default ubuntu/debian)
CacheRoot /var/cache/apache2/mod_disk_cache
# how deep to make directory structure levels
CacheDirLevels 5
CacheDirLength 3
# how big of files shall we cache – 1 mb as below can be more
CacheMaxFileSize 1000000
# how small of files do we cache
CacheMinFileSize 1
# What’s our default expire time?
CacheDefaultExpire 3600
# What’s our max expire time?
CacheMaxExpire 8600
# we may not need this, if customer site plays nice
CacheIgnoreCacheControl On
CacheIgnoreNoLastMod On
# do NOT cache Set-Cookie header – it will cause big problems!
CacheIgnoreHeaders Set-Cookie
# if no other cache headers are avail, set cache time by lastmod (below is 1 hr)
CacheLastModifiedFactor 0.1
# go ahead and cache things the app said don’t cache
CacheStoreNoStore On
# go ahead and cache private things
CacheStorePrivate On
</IfModule>
</IfModule>
# don’t cache admin pages
<LocationMatch “^/adminurl”>
# SetEnv no-cache for any pages/group of pages you wish to not cache
SetEnv no-cache
</LocationMatch>
# next, we (probably) want to override customers app cache expire schema as follows
<Location />
SetEnvIf Request_Protocol “HTTP/1.1” expires_overrule
# homework: add a SetEnvIf to see if cache-control max-age is present
Header unset Expires env=expires_overrule
</Location>
The above configuration was worked out and tested on http://www.dailytexan.com/ (DDI# 634102) and resulted in a 62% hit-rate, which broughtoverall system load down substantially (though, not enough to prevent them from needing a load balancer and a second server).
Tips & Warnings
- All normal warnings about using caching technology apply. You should exclude their admin interface pages, warn customer that updates to site may not show up immediately, etc.
- Beware — you ALWAYS include the CacheIgnoreHeaders Set-Cookie – or mod_cache will cache the Set-Cookie header with the request, and then set that same cookie for new clients each time it serves the cached page! This causes session poisoning, which is a Bad Thing(tm).
- Worpress sites WILL NOT WORK by default. You must do the following, due to how mod_cache interacts with mod_rewrite
In the .htaccess file, change “RewriteRule . /index.php [L]” into “RewriteRule ^(.*)$ /index.php/$1 [L]”.
Related
Here are some good related links on mod_cache that you may find useful if doing more advanced things than described above and/or for just generally getting a better understanding of mod_cache.
- http://www.philchen.com/2009/02/09/some-tuning-tips-for-apache-mod_cache-mod_disk_cache
- http://www.softslate.com/blog/2011/07/apache-modcache-in-real-world.html
- http://httpd.apache.org/docs/2.2/caching.html <– official Apache howto
- http://httpd.apache.org/docs/2.2/mod/mod_cache.html <– official Apache config directive referenc