Joomla and NGinx Reverse Proxy Caching: Keeping your dynamic content fresh

We've placed a caching NGinx reverse proxy in front of our Joomla site, but haven't yet addressed the issue of the content that we do want to remain dynamic. We might, for example, have a module such as mod_GoogPlusFeed embedded on a number of pages, but with the configuration we've used so far, this won't update until the cached copy has expired.

In this guide, we're going to walk through the few easy steps to ensure the content is regularly updated - without undermining what we were originally trying to achieve - fast response times.

The primary basis of what we're going to do is very similar to Dealing with Slow modules and Caching, so much so that some of the set up is the same

 

Introduction

To summarise the issue, essentially when a visitor accesses our site, they are effectively served a static HTML version (assuming the page they've requested exists in the cache). If there is dynamic content within that page (such as a Twitter feed - though the official widgets are unaffected) then that content won't update until the cached copy expires.

In this guide we'll be looking at a method of refreshing the cached copy so that visitors can be served from the cache, without letting the cached content get too stale.

 

What this won't work for

If you're showing dynamic content on every page of your site, this method isn't likely to be as effective as you might like. The method we'll explore here is best suited to when the dynamic content is on a few specific pages (such as the Twitter feed on my home page, and the Google Plus Feed on 'About Me').

 

What we're going to do

In the last guide, we explored how we can use custom HTTP response headers to affect whether pages are cached. This time we're going to be using a custom HTTP request header to influence whether we're served a page from the cache or not. The aim being to bypass the cache, but ensure that the server's response is still cached for retrieval by other visitors to the site.

Through creation of an incredibly simple cron job, we're going to request the relevant pages at a regular interval to freshen the cache. We'll be requesting the page(s) every 15 minutes, though the period you select is up to you.

There is an alternative solution, but it requires a third-party patch, so is outside the scope of this article (we're avoiding out-of-band software). For those who understand the risk and are happy to maintain it, you can use proxy_cache_purge to clear things out.

 

Configuring NGinx

 We previously configured our server block to contain

proxy_cache_bypass $upstream_http_x_dont_cache_me $cookie_jnocache;

Now we're going to add a downstream header to that

proxy_cache_bypass $upstream_http_x_dont_cache_me $cookie_jnocache $http_x_gimme_fresh;

Save your configuration and reload NGinx. Now, any request including the header X-Gimme-Fresh will be served from the origin server, but crucially the response will still be eligible for caching.

 

Creating Our Cron Script

We now need to think about which URLs we want to refresh, ideally it'll be any that contain dynamic content (and might also include your sitemaps if you haven't configured those not to cache).

For the sake of example, I'm going to configure to refresh my Homepage and the 'About Me' page. We're going to create a simple BASH script to request the pages, whilst specifying the header.

 #!/bin/bash
HEADER="X-Gimme-Fresh"
wget -O - --header="X-Gimme-Fresh: True" "http://www.bentasker.co.uk" > /dev/null
wget -O - --header="X-Gimme-Fresh: True" "http://www.bentasker.co.uk/about-me" > /dev/null

Save as /var/www/cache-refresh.sh and make it executable (chmod +x /var/www/cache-refresh.sh)

If you monitor your webservers logs, you should see the requests come through to Apache every time you run the script, so now we simply need to add it as a cronjob

crontab -e
*/15 * * * * /var/www/cache-refresh.sh

Save and exit. The pages we refresh will now be refreshed in the cache every 15 minutes, job done!

 

Conclusion

Given the choice, I'd prefer to be able to purge the entire cache on demand, but don't like the idea of maintaining custom patches on a publicly accessible webserver. Ideally, I'd have liked to create an OnContentAfterSave plugin to send a custom header (and trigger a cache flush) when content is saved, this however is an acceptable solution for when the important dynamic content appears on a limited number of pages.

 

Next: Keeping Hitcounts accurate when using an NGinx Caching Proxy