Building a Tor Hidden Service CDN

Last year I started experimenting with the idea of building a Hidden Service CDN.

People often complain that Tor is slow, though my domain sharding adjustments to the bentasker.co.uk onion have proven fairly effective in addressing page load times.

On the clearnet, the aim traditionally, is to try and direct the user to an edge-node close to them. That's obviously not possible for a Tor Hidden service to do (and even if it were, the users circuit might still take packets half-way across the globe). So, the primary aim is instead to spread load and introduce some redundancy.

One option for spreading load is to have a load balancer run Tor and then spread requests across the back-end. That, however, does nothing for redundancy if the load-balancer (or it's link) fails.

The main aim was to see what could be achieved in terms of scaling out a high traffic service. Raw data and more detailed analysis of the results can be seen here. Honestly speaking, It's not the most disciplined or structured research I've ever done, but the necessary information should all be there.

This document is essentially a high-level write up along with some additional observations

 


 

Test-Case

My primary test-case was video delivery. Video is bandwidth intensive, so isn't necessarily a good use of Tor's limited bandwidth (I also spun up some relays to offset the bandwidth I was using), but the aim was to focus on content where delays and issues with delivery are most visible.

The video used was HLS at various segment sizes and with varying bandwidths (and some ABR playlists allowing the player to change as needed). The content itself was Big Buck Bunny (which I now know rather well....)

In the real-world, a service will usually be viewed by a range of players, so I tested playback with a variety including

Single Player tests were performed using the stream tester.

For some tests, the cache was artificially warmed. For example

HOST="cix7cricsvweeu6k.onion"
s="127.0.0.1"

for br in 512 1024 2048
do
   for num in {0..10}
   do
      curl -H "Host: $HOST" -o /dev/null \
      "http://${s}/Big_Buck_Bunny-HLS/Big_Buck_Bunny_${br}_0000${num}.ts"
   done
done

Delivery time statistics were extracted from the NGinx logs themselves, so represent the time between the request headers first being received, and the last byte being passed to the kernel for delivery

 


 

Topology

I didn't want to overly saturate the network, but was interested in the impact of having the edge go upstream via Tor, so I built a (small) two-tier CDN with the (single) origin on the clearnet.

Edge HS -> Midtier HS -> Origin (clearnet)

Edge devices achieved some level of redundancy by having multiple devices advertise the Hidden Service. Requests would therefore go to whichever announced last.

Both the Edge and the Midtier run caches, honouring the cache control headers received from the Origin.

The various configurations used can be seen here.

 


Test Observations


 

Video Delivery Experience

Various tests were run in order to see what quality of video could reliably be delivered, particularly when multiple clients were watching the same stream (yielding a higher cache hit rate).

With two second segments, there was a high risk of buffer underruns occurring.

Delivery times varied, but with a cold edge cache (playing a 1Mb/s stream) delivery times were sometimes as high as 12.8 seconds, though these seemed to have been caused by network conditions between the client and the edge rather than as a result of needing to go upstream.

Aside from one test (which is believed to be caused by circuit collapse), moving to 10 second segements yielded a vast improvement. Whilst there were still spikes in delivery times, the longer segment length provided a much greater margin for error. Although delivery wasn't always reliable, 10 second segments allowed 1080p video to be streamed. 

Adding multiple players improved delivery over time, whilst delivery durations still spiked, it occurred far less frequently over time.

Delivery of 480p video using 10 second segments was undoubtedly the most reliable, though analysis of delivery times and cache dispositions at each tier suggests that difficulties were more likely to be the result of issues between the client and the edge than issues between the tiers.

 


 

Static Content

A small test was also run using more run-of-the-mill content - images, javascript and CSS. In reality this is likely to be more standard fare on a Hidden Service (between bandwith constraints, avoidance of plugins and javascript, streaming video isn't something you'd often encounter).

Delivery in this case is much easier, although you still want to get it to the client quickly, it's not nearly as time sensitive as video delivery.

Even large images (with a CACHE_MISS) could easily be delivered quickly enough to not impact the browsing experience. Requests were also more likely to spread across the edge, particularly if an edge node became unavailable. Because the content was being served from a separate domain, Tor Browser was able to parallelise downloads beyond the traditional 6 connection limit.

 


 

Upstream Costs

Going upstream incurs additional delay, even for a traditional clearnet CDN. Delivery will almost always be fastest when served directly from cache. Various CDN offerings handle this differently, some routing requests to other closer devices within the CDN in case they have it in cache - the constant being that the relative network cost of reaching the origin from any given device is known.

However, with a fully Tor based CDN, things are slightly different. Whilst you might know where the origin is located, you don't know in advance where the various relays in your upstream circuit will be located. There's also the initial latency cost of actually establishing an upstream circuit.

In the tests, I addressed this - to some extent - by ensuring that the edge nodes used a HTTP/1.1 Keep-alive connection to the midtier. Although the setup cost was still incurred when an existing connection wasn't available, over time (and as traffic grew) the likelihood of this diminished - especially once cache_lock was enabled (so if two requests are received for the same asset, one will wait while the other goes upstream and then be served from the cache).

As a rule of thumb, though, even cache re-validations can be expensive if a connection needs to be established to the mid-tier, so some effort needs to be put into ensuring that a reasonable connection pool is maintained where possible. NGinx does a pretty good job of this once you've told it to use a Keep-alive connection.

It's for this reason that a mid-tier is probably advisable. In a real world deployment, the origin would almost certainly be a Hidden Service rather than a clearnet site. Configuring persistent connections to the origin could potentially help an observer begin to identify the real world location of that origin.

Instead having persistent connections to a "disposable" midtier resolves some of the latency issues that would otherwise be encountered. With a small(ish) midtier serving a broad edge, the likelihood of the midtier having an asset in cache is increased, leading to improved delivery times. In theory at least.

 


General Observations


 

Request Routing

As noted in MISC-18, request routing in the tests relied on a race condition in the way that Hidden Service descriptors are published. Not only does it make it difficult to predict (in advance) which node a request will go to, it also means that the load on those nodes isn't taken into account at all.

A better way to address this might be to have a routing service (relying on the race condition for redundancy) which then takes metrics from the edge and serves a HTTP redirect in order to send the client to a specific group of edge nodes (using a different descriptor)

So the flow would be something like the following

Client -- GET  http://foo.onion/foo.js --> Router 
Router -- HTTP 302 http://bar.onion/foo.js --> Client
Client -- GET http://bar.onion/foo.js --> Edge node 1

The edge nodes in this topology could also operate in groups (again relying on the race condition) to ensure some level of redundancy.

It does come at the cost of having to set up an additional circuit, so further metrics would need to be run to see what the impact of this is. In principle though, it'd be as simple as having an agent on each of the edge nodes send an indicator (or raw statistics) to the router.

The router could simply run openresty so that the routing of each incoming request could be handled by a simple LUA script. If there was a desire to ensure that future requests hit the same edge node, a short-lived cookie could be set by the router to help with future routing decisions.

 


 

Trust Issues

At a technical level, productisation of a Tor HS CDN service is certainly achievable, however there are additional considerations to be made, especially in terms of the levels of trust required across the board.

 

CDN Operator trust

In a traditional clearnet model, the CDN operator generally knows who their customer is. However, Hidden Service operators are unlikely to be willing to disclose their identity to a third party just to be able to serve content quicker. Payment wouldn't necessarily need to disclose who the HS operator is either, if made via Bitcoin or similar.

The result is that the CDN operator would likely have no idea who their customer is, and by extension, little knowledge of what content will be passed through their infrastructure.

The deep web isn't quite the wilderness of drugs and child porn that that the media would have you believe, but some consideration would still need to be given as to whether your infrastructure would be used to serve content you disagree with (or might be legally liable for). Laws (unsurprisingly) vary across jurisdictions, so it's unlikely you'd be viewed as "just a carrier" in at least a few, especially given the reputation dark nets currently have.

Given the anonymous nature of such a setup, it'd be hard to judge whether that's likely with any given customer, making it quite a high risk business model. A CDN Operator would almost certainly start receiving questionable requests from various entities (Law enforcement, Copyright MAFIAA etc) quite quickly, placing them in a fairly compromising position. Equally as certain is that the edge would commonly be targeted by others seeking to compromise users of various services.

However, if the business model were constrained to only accept customers willing to identify themselves (for example, Facebook's onion), this would be less of a concern.

 

HS Operator Trust

There have been some fairly concerted efforts in the past to try and de-anonymise Tor users, so a Hidden Service operator would have to weigh up the risk that you might tamper with the content being served in order to try and track/identify users.

If the operator was looking to serve their entire site via CDN (rather than just JS/Images etc) then they'd need to either rely on a HS descriptor to which you hold the keys, or worse, surrender the private key for their preferred HS so that the edge could effectively serve it.

Neither is a particularly enticing proposition as it places a lot of control in the CDN operator's hands. To some extent, though, this is also true for a clearnet CDN operator - they also have the means to tamper with content being served, but as a general rule there's likely less value to doing so.

To avoid tampering, it could be possible to serve static content encrypted and then decrypt it client side, but then you're reliant on some sort of client-side implementation. With most Hidden Service users eschewing Javascript, it's a pretty tall order and, even then, feels like overkill just to be able to serve images.

 

User Trust

With a suitably streamlined configuration, the average user probably wouldn't be aware that they're even using your CDN, however many of their concerns align with that of a HS operator. The difference here, though, is that as more HS's use the CDN, the greater the chance that the user will hit request routing via a circuit they'd previously used in order to be routed to resources for a different Hidden Service.

This would potentially make it possible to track a user across services, which is beneficial to no-one.

 


 

Conclusion

Building out a scalable fully Tor Based CDN is technically feasible and quite easily achievable, but building a general-availability business model around it would be challenging to say the least. Whether through risk to the business itself, or risk to the anonymity of Hidden Service Operators and users alike, there are more than a few obstacles that would take some effort to overcome.

High quality streaming video delivery is achievable, if not currently a particularly desirable use of the limited bandwidth available within the tor network (though this could - and should - be offset by putting relays online to offset the resulting bandwidth usage).

For a particularly high traffic Hidden Service, it may be beneficial to build out similar infrastructure in order to increase availability/throughput, though it does come at the cost of exposing additional equipment - increasing the chance for something to be linked back to the operator, particularly as it'd be decidedly non-standard implementation.

As a cloudflare type general availability service, though, the idea is probably all but dead in the water. Some of the obstacles may be addressable, but there doesn't seem to be a good way to overcome some of the trust issues.

In effect, any business built around the general-availability model would need to be willing to perform a lavabit style shutdown at short notice, making any investment of time or money risky. Failure to do so would destroy trust in the infrastructure when it inevitably comes to light, also resulting in the death of the business.

With the way Hidden Services are currently configured, having an entire hidden service go via CDN simply isn't achievable/desirable based on the amount of control the HS Operator would need to pass to the CDN Operator. Either the CDN operator is completely anonymous (and so, very difficult to trust) or they're a public entity which risks them being coerced by Law enforcement, even before they're targeted by other interested parties. We've already seen that Law Enforcement Organisations are very interested in learning to de-anonymise Tor users, so even if the content you're serving is 100% legal across the globe, there's a non-negligible chance of being targeted.

Technically speaking, it was fun to play around with, but there doesn't seem to be a good way to put such a setup into production unless/until you have a service that needs the scale. Handling traffic for others is almost certainly a no-go.