An argument in favour of application level name resolution

Recently I published some documentation detailing how to build and run your own DNS-over-HTTPS (DoH) server.

As I mentioned at the beginning of that documentation, there's been a certain amount of controversy about DoH vs DNS over TLS (DoT).

One thread of that argument is along the lines that name resolution should be handled at the OS level (so that all applications get the same result for a given name - improving troubleshooting - as well as giving some caching benefit, versus applications resolving names themselves).

Generally I've found that argument fairly persuasive, but also taken the view that DoH being implemented at the application level is the result of a general lack of availability/uptake of DoT at the OS level.

In other words, whilst it's not ideal for applications to be resolving names themselves, it makes an (arguable flawed) privacy-enhancing solution available now, rather than continuing to wait for an (arguably) better solution to actually get adopted (and ignoring whatever reasons led to that lack of adoption).

But, I've begun to change my mind on whether applications doing resolution themselves really is a problem, or whether it's actually more beneficial when considered alongside some of the aims of DoH



The primary aim of both DoH and DoT is to afford a level of privacy and trust to our DNS lookups. Although both implement this in different ways (and afford differing levels of trust to the network operator), the core aim is that a network observer should be unable to either see the queries being made, or modify the results being returned.

DoH goes one step further than DoT in this, by trying to make it difficult for a network operator to block DoH traffic. That in itself has understandably raised objection but, is also why I'm focusing on DoH in this post.


The Issue

On my android phone, I used Jigsaw's Intra to intercept DNS lookups, encapsulate them into a HTTP request and send them out to my DoH server.

However, earlier in the month, I noticed that some queries seemed to be leaking out onto the local network. Unfortunately the issue resolved itself so I couldn't investigate more until I noticed a recurrence this weekend, and could dig in a bit more (MISC-32 for those who are interested).

The root cause of the leakage was that the, (IMO) extremely aggressive, memory management on my Android phone was killing the Intra process, but leaving the background service running. As a result, the Intra icon remained in the notification bar, but no queries were being intercepted, and instead went straight to whatever DNS server was returned via DHCP (or to EE when I was on mobile data).

Note the Intra Icon in the notification bar

I picked up on the leakage on Saturday whilst out one someone elses wifi. I mis-typed an address and got an NXDOMAIN interception page from BT.

For me, that's annoying. But, Intra was created and released to combat censorship in countries where leakage could be more severe than for me. In particular, it's those middling cases that are likely most affected - countries where access to information is being restricted and/or monitored, but things aren't yet so severe that user's automatically feel the need for a VPN (even if they, perhaps should). Or, for example, a western democracy that sees fit to log user's online activity, one that is already objecting to DoH as it may prevent censorship.

The cause wasn't the Intra app itself, but my phone has been sat silently leaking queries for nearly a week, with all user-visible indications being that it was still active and working. Going into the app itself would have re-instated service (for a while) but the app doesn't provide any info, so there's no reason for the user to revisit.

It's worth noting, that the same issue would likely have applied if Intra instead turned DNS lookups into DoT rather than DoH - the underlying issue is that the OS killed the process responsible for that interception, and appears to do so quite routinely (Fix is here for those that need it.)


Application Level Resolution as a Mitigation

The level of leakage in my case was reduced because I had enabled Trusted Recursive Resolver (TRR - their name for DoH) in Firefox for Android. With network.trr.mode set to 2, Firefox handled most of the resolution for any browsing I was doing - though it'll have let queries fall through to the OS when an unsuitable response (including NXDOMAINs) was received.

That, obviously, still isn't perfect, but is a damn sight better than having all my queries hitting the local network.

The problem there, of course, is that if an ISP were to block my DoH server, I likely wouldn't even know there was an issue until far too late, as Firefox would hand down to the OS, which would send straight out onto the network, while I'd believe that either Firefox or Intra were handling the queries.


The Case

The reason we want DoH (and DoT) is both to combat DNS response tampering and ensure some level of privacy. In principle, there's no reason this can't be achieved at the OS level.

However, the reality is that we live in a world where not only do bugs happen, but they happen frequently. Like it or not, we also live in a world where the mobile use-case can no longer be ignored - you have to factor in the needs of users running Android on Chinese hardware.

Provision of privacy at the OS level is "good enough", some or even most of the time. But, that's simply not sufficient for a non-negligible number of use-cases.

Having resolution occur within specific applications (browsers in particular) provides a safety net for users, so that if one mechanism should fail, there's still the protection of the other.

This is no different, in principle, to the desire to use HTTPS on Onion sites - Tor's encryption almost certainly is good enough, but adding HTTPS comes at little cost and provides additional re-assurance.

A double failure is still a risk, but much less likely, and hopefully in future applications would do more to make users aware if something seems to be awry (or at least have the option to do so). Just as we want to work on improving implementations so that failures don't happen in the first place, we should also be working to ensure mitigations are in place for when failures do happen.



Inevitably, in any discussion of DoH or DoT, someone will ask why users don't just use a VPN (or use Tor) if they're that concerned?

The thing is, for many use-cases a VPN adds overhead and latency for very little gain compared to DoH/DoT combined with encrypted SNI (ESNI).

This is something that effects those of us in the Western world too (and I fear, will increasingly do so). The UK Government, for example, are already planning to expand their censorship efforts in order to "reduce online harms". Obviously such an effort could never go wrong in the UK, and the well intentioned apparatus would never be completely repurposed from it's original aim. And we'd certainly never have an authority try and mis-state, or mis-use their authority under law. Nor has the (expensive) censorship infra been shown not to be very good.

Over time, it's possible that the overhead of a VPN may be required on a more routine basis, but in the interim DoH may well be all the average user requires. So it's not so much that using a VPN is the wrong answer, just that it's not always the most suitable answer