Multi-homing a Joomla site between the WWW and a Tor Hidden Service

I did some work recently on making BenTasker.co.uk available via both a Tor Hidden Service (otherwise known as a .onion) and via the WWW.

The reasons for doing this are published elsewhere, but this documentation summarises the steps I had to take (and why) in order to have the site safely accessible via both routes of access.

For those who are interested, there's a far higher level of detail over on Projects.bentasker.co.uk.

 

Assumptions

I'm assuming that if you're following this, you are at least vaguely familiar with how to set up a Tor Hidden Service. If you are attempting to do the same, you'll need to make adjustments to suit your own circumstances.

 

Challenges

Setting up a Hidden Service is fairly straight forward, and in this case I didn't have the additional burden of having to take steps to keep the origin server a secret.

Having the site available to two different Internet's brings it's own challenges, though, in part because of differences in the way they operate and also because the two can overlap at points.

 

Bentasker.co.uk uses HTTPS

The first issue to address was the fact that my site is HTTPS only. Whilst HTTPS is possible via Hidden Services, it generally results in visitors receiving certificate warnings (unless you're willing to pay Digicert for an EV certificate - which I wasn't). So the site needed to be made available via HTTP, without disabling HTTPS on the WWW.

An added complication is that my CMS (Joomla!) is configured to enforce HTTPS access.

My solution to this was to have an NGinx reverse proxy act as a bridge between the Tor client and the live site, the client connects over port 80 and the proxy goes upstream over port 443.

The reverse proxy also adds a specific request header so that the origin knows the access is via the .onion.

 

Potential for a Duplicate Content Penalty in Google's Indexes

Services like tor2web allow visitors to browse Hidden Services via the WWW by setting the Hidden Service descriptor as a subdomain (e.g.https://duskgytldkxiuqc6.tor2web.org/).

The issue here is that if Google were to come across a link to my onion via tor2web, they'd index it and discover that the content is identical, which could lead to my WWW site getting marked down in the search listings.

My initial approach to this had been to serve a custom robots.txt on the onion, but it was pointed out on the Tor Mailing lists that all tor2web requests will carry an X-Tor2Web header.

So my onion will block all requests with that header, and also serves a custom robots.txt in case the service ever stops supplying that header.

    # Prevent Tor2Web access
    set $tor2web F;
    if ($http_x_tor2web){
        set $tor2web T;
    }


    if ($request_uri = '/405.html'){
        set $tor2web R;
    }

    if ($tor2web = T){
        return 301 /405.html;
    }

    error_page 405 = /405.html;
    location /405.html {
        root /usr/share/nginx/onions/bentaskercouk;
    }

The configuration for this is currently more convoluted that I'd like, as NGinx seems unwilling to use a custom error page for 405 or 406 responses. I didn't want to simply bar access, but to also explain why, so in the meantime I've 301'd to the notification page, effectively blocking access (though it means the status header returned is inappropriate). 

Note: Accessing an onion via tor2web also presents a possible MITM risk, as you've got to trust tor2web to simply relay the content without changing it - so direct access is always preferable. The block page carries a message to that effect.

 

Existing Anti-Abuse Scripts Pose a DoS risk

This was perhaps one of the bigger technical hurdles to handle. I run a number of anti-abuse mechanisms at the application level (for example Akeeba's Admin Tools) with the intention of catching and stopping SQL Injection attempts so that the likelihood of being stung by a 0-day is somewhat reduced.

Multiple attempts at doing something 'bad' can lead to a temporary ban on the source IP, as much so that cycles aren't wasted processing garbage traffic as anything (IP bans are easily circumvented, especially if your attacker is not an automated script).

Where this poses an issue is that all traffic to the .onion will have the same source IP - 127.0.0.1. So it'd be possible, in just a few requests to effectively shutter access to the .onion for the lifetime of the ban. 

The challenge here was that I didn't want to weaken protection on the WWW side, but ideally wanted to keep as much benefit of the protection scripts as possible.

The solution, if a little convoluted, was to create a plugin for Joomla that generates a 'fake' IP for the client when they originate from Tor (triggered based indirectly on the header sent by the reverse proxy).

The IP is derived from the client's source port (sent upstream by the first Reverse proxy) and the minute of the hour (to artificially limit the lifetime of a ban) and falls within the RFC3927 range to avoid collision with any prefixes I might use now (or in the future) .

The port number is sent upstream by the Nginx reverse proxy

proxy_set_header X-downstream-port $remote_port;

A copy of the plugin can be found on Github.

Not All Links Are Relative

In order to aid browser parallelisation some of the static content on my site is served from a subdomain, the choice of which is controlled by NoNumber.nl's CDNForJoomla plugin.

I didn't want requests for static content going out over the WWW, so the plugin needed to be adjusted slightly so that it'd use a .onion instead if a specific header was present (again, the header sent by the reverse proxy).

The reverse proxy was also configured to accept a subdomain of the .onion (static.6zdgh5a5e6zpchdz.onion) and configured to cache the upstream responses (as it's static content)

 

This turned out to only be part of the challenge though. Although the remainder of links in the site are relative, Joomla inserts a base tag into the document header, so in testing my browser was still taking me back to www.bentasker.co.uk.

To resolve this, I inserted the following into the top of my template's index.php

if (isset($_SERVER['X_1234_ITS_AN_ONION']) && $_SERVER['X_1234_ITS_AN_ONION'] == ':true'){
        $this->base=str_replace("https://www.bentasker.co.uk","http://foo.onion",$this->base);
}

Joomla now sets the base tag accordingly depending on whether or not the access is via the .onion

 

Cache Poisoning

One major issue with configuring Joomla to rewrite links based on the source, is that there is a Caching Reverse Proxy between it and the outside world.

So the cache name spaces had to be split, adding the following to the Server block on the Origin was sufficient to resolve that

if ($http_x_im_an_onion){
      set $onionaccess ':true'; # Make sure it won't clash with an existing slug
}
proxy_cache_key "$scheme$host$request_uri$onionaccess";

At this point, the header going upstream to Joomla was changed (and kept a secret) so that no individual part of the Onion specific behaviour could be triggered independently - If you send an X-Im-an-onion header, the system will treat you as though you were accessing via Tor.

Any public facing page (notably "Your stored data") that might disclose the headers added by the proxy had to be updated.

 

Privacy Changes

I don't do much on the WWW that should be unsuitable for a .onion, but it still seemed like a good opportunity to review the possible implications.

I've not entirely finished with that (a decision needs to be made on adsense) but during testing for possible privacy implications it was discovered that the Shop section could redirect users from the .onion back to the WWW. without warning.

So as a precaution the shop section has been made unavailable until I've developed a fix

    location ~ /shop {
        try_files /noexist /shpblock.html;
    }

Testing also highlighted a minor bug in the way that Social Icons and Google Analytics are handled.

 

Eventual NGinx Config

My eventual configuration at the Tor-side reverse proxy was as follows (note: I blocked /administrator as I have no plans to access the back-end via Tor).

server {
    listen       localhost:80;
    server_name  6zdgh5a5e6zpchdz.onion; 
    root /usr/share/nginx/onions/bentaskercouk;


    # We check disk first so I can override things like robots.txt if wanted
    location / {
       try_files $uri $uri/ @proxyme;
    }

    location = / {
        try_files /homepage @proxyme;
    }

    # 404's are handled by the back-end but
    # redirect server error pages to a local file
    #
    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/onions/bentaskercouk-errors;
    }

    error_page 405 = /405.html;
    location /405.html {
        root /usr/share/nginx/onions/bentaskercouk;
    }

    # See MISC-7
    location ~ /shop {
        try_files /noexist /shpblock.html;
    }


    # Proxy to the back-end
    location @proxyme {

        # Pass the remote port details upstream
        proxy_set_header X-downstream-port $remote_port;
        # Set a header so the back-end knows we're coming via the .onion
        proxy_set_header X-IM-AN-ONION 1;

        # Make sure the host header is correct
        proxy_set_header Host www.bentasker.co.uk;

        # Send the request
        proxy_pass   https://www.bentasker.co.uk;

        # TODO
        # Do we want to cache rather than sending every request upstream? 
        # Probably not, but revisit later
    }


    # Don't even bother proxying these, just deny
    location ~ /\.ht {
        deny  all;
    }

    location ~ /administrator {
        deny  all;
    }
    # Prevent Tor2Web access
    set $tor2web F;
    if ($http_x_tor2web){
        set $tor2web T;
    }


    if ($request_uri = '/405.html'){
        set $tor2web R;
    }

    if ($tor2web = T){
        return 301 /405.html;
    }


}

With the block for the Static content onion being somewhat simpler

server {
    listen       localhost:80;
    server_name  static.6zdgh5a5e6zpchdz.onion;
    root /usr/share/nginx/onions/bentaskercouk;


    # 404's are handled by the back-end but
    # redirect server error pages to a local file
    #
    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/onions/bentaskercouk-errors;
    }


    # Proxy to the back-end
    location / {

        # Set a header so the back-end knows we're coming via the .onion
        # Shouldn't matter for static content, but worth having in case
        proxy_set_header X-IM-AN-ONION 1;

        # Make sure the host header is correct
        proxy_set_header Host static1.bentasker.co.uk;

        # We do some caching so we're not forever having to do handshakes
        proxy_cache my-cache;
        proxy_cache_valid  200 302  7d;
        proxy_cache_valid  404      5m;
        proxy_ignore_headers X-Accel-Expires Expires Cache-Control Set-Cookie;
        proxy_cache_key "$scheme$host$request_uri";
        add_header X-Cache-Status $upstream_cache_status;

        # Send the request
        proxy_pass   https://static1.bentasker.co.uk;
    }


    # These are actually blocked upstream anyway, but why bother proxying?
    location ~ /\.ht {
        deny  all;
    }

    location ~ /administrator {
        deny  all;
    }

}

 

Conclusion

Although there were some minor changes, adjustments to Joomla have actually been pretty minimal, the only changes to the Origin's reverse proxy were to force certain values to be used (so that we're not trusting headers that were potentially supplied by a user).

Unsurprisingly, getting the Tor client set up was the easiest part of getting everything up and running, and there are still a few issues to address (like what to do with the issues in the shop) but it wasn't nearly as challenging as I originally expected. 

By passing connections through a reverse proxy I've also gained the ability to override specific files on the .onion (such as robots.txt).