Building a Tor Hidden Service From Scratch - Part 3 - General User Anonymity and Security

Ben Tasker

2015-07-20 02:32 (updated 2019-05-06 10:11)

This is Part 3 of my Hidden Service From Scratch documentation. In Part One we designed and built our system, in Part Two we configured HTTP Hidden Service hosting.

In this documentation, we'll be looking more generally at user account and identity protection, as well as examining why you may need to maintain a certain level of paranoia even if your hidden service doesn't fall outside the law in your home country.

Introduction

This section will cover how to implement various mechanisms which may be of use in protecting from Spam/bots and user account compromise.

Many of the approaches used on the Clearnet are effective, but may lead to a small loss of privacy. Most Hidden Service users use Tor precisely because they care about privacy and will expect that design decisions that you make will take that into consideration

In the previous module, it was stated that using server-side scripting should be avoided where possible. For some sites, though, it simply isn't possible - forums for example simply don't work without some form of scripting.

Credential phising is particularly prevalent on the darknet, so you may wish to implement 2 Factor Authentication so that your users can choose to use an additional level of protection. Mechanisms such as Google Authenticator and the Yubikey obviously aren't going to be particularly welcome, as they require information to be disclosed to a third party.

Similarly, you may wish to send emails to users, but care needs to be taken in order to do so without revealing the identity of your server.

In this short module, we'll be looking at

Bot/Compromise Protection
- CAPTCHA's
- 2 Factor Authentication (2FA)
Email
- MTA -> Tor
- HS -> Public Mailserver
Handling User Data in General
Opsec is essential
Backups
- Push
- Pull
- Encryption

Bot/Compromise Protection

CAPTCHA's

Traditional CAPTCHA services (such as ReCaptcha) require a request to go to a third party (such as Google) to generate (and often to verify) a Captcha. On Tor, this is generally considered to be unacceptable.

You may also, as seems to have been true in Ross Ulbricht's case, be risking your own anonymity by using such a service. It's alleged that Ulbricht misconfigured his Captcha in such a way that it was leaking his server's clearnet address, ultimately leading to identification of the server.

Whether or not this is true, it highlights two major lessons which can be learned

Never use the clearnet IP/hostname in testing, use 127.0.0.1
Minimise use of third party services as far as possible

Although traditional CAPTCHA's are not a good fit for Tor hidden services, it is true that you may still have a need for the protection that they provide.

Essentially, there are two options

Roll your own (basic) CAPTCHA generator
Use a CAPTCHA generator designed for hidden services

The former is generally preferred - although it may be via a hidden service, the latter still requires that trust be placed in an unknown third party. Your captcha needn't be too complex to keep the most common bots out of a Tor Hidden Service.

2 Factor Authentication

On the Clearnet, there are a number of mechanisms commonly used to implement Two Factor Authentication:

User receives a text containing a one-time passphrase/string
User runs an app to generate a one-time passphrase/string
User has a hardware token that generates a one-time passphrase/string

Each of these has it's own strengths and weaknesses, but none of the above are really suitable for a Tor environment as they either require the user to surrender identifying information (e.g. their mobile number), require a request to be made to an external service (e.g. such as Google Authenticator).

At a technical level, you could, theoretically, implement a Tor friendly 2FA solution using a Yubikey. The default configuration would require an API request to go to Yubico's servers, but it is possible to run your own. Most users, however, wouldn't be happy with having a yubikey configured to use a server specific to your site (or Tor in general) as it could consitute physical evidence in some states.

There is, however, a Tor friendly 2FA solution which can be implemented - PGP.

Some sites, after (or during) registration will allow users to upload their PGP public key. Once the 2FA functionality is enabled for a user account, their login flow becomes something similar to the following

Enter username/password and submit
Server encrypts a random string using their public key
User is presented with the ASCII armoured output
User uses their public key to decrypt the string
User enters string into the field supplied and submits
Server checks the string matches and grants access if so

There are a number of common traps to avoid though.

Firstly, you must warn users to use common sense regarding the key they upload. There's very little point in taking steps to keep users anonymous if they then upload a public key with their (publicly known) email address as the recipient. Users should always generate a new keypair using bogus data.

If an invalid password is entered for a user with 2FA enabled, a 2FA page should still be displayed. If an attacker is attempting to access an account, you don't want to admit that they've at least got the password correct (as it may be in use somewhere else). Internally, you can log whether it was 2FA or the password that failed if you wish, but never tell the user trying to log in.

The string you generate doesn't need to be long, but it must be impossible to predict, otherwise the protection that 2FA is supposed to offer is completely negated.

You also need to ensure that uploading a public key and actually enabling the 2FA are two distinct steps. Ideally, you should include a 'test' run within the enabling process so that the user can verify that they've uploaded the public key that they think they have - it's all too easy for a user to lock themselves out of their own account if 2FA is automatically enabled as soon as they've submitted their public key.

Email

Having a Hidden Service send emails to users requires far more planning than on an equivalent clearnet service.

On the clearnet, you would simply install a Mail Transport Agent (MTA) such as Postfix or Exim and then configure your back-end to send mails via that.

However, a quick glance at the headers of any mail sent through this MTA will reveal your server's IP address - something which is obviously undesirable.

There are a number of options available, each with their own benefits and drawbacks, and it's important to assess which best fits your needs.

MTA -> Tor

The simplest way to add email support to your hidden service is to install and configure an MTA, and then add a firewall rule to ensure any traffic your MTA generates is routed over Tor.

Depending on the MTA, you may also need to adjust configuration to ensure that the server doesn't reveal anything about itself.

The major drawback with this approach, however, is that the receiving MTA may reject your mail (or mark it as spam) as the connection source (as it sees it) will be a Tor exit node and may well be included in DNSRBL's.

As your mailserver isn't on the clearnet, you'll also be unable to receive mails sent in response - which may or may not be an issue for you.

You will also need to give careful thought to which MTA you use, and how your hidden service will access it. For example, in some circumstances, Exim may leak BCC addresses . On a clearnet service this may just be embarassing, but on a Hidden Service it constitutes a huge breach in user trust and depending on the content of your hidden service may put users in certain jurisdictions in danger.

Benefits:

Quick and relatively easy to implement

Downsides:

Mail may not actually be received by the users
You'll need to properly configure your MTA

Hidden Service -> Public Mailserver

To reduce the likelihood of your mail being filed in the user's spam folder, you'll probably want to use a publicly accessible mailserver. Running your own is an option, but will associate you with the hidden service.

An alternative option is to create an account with an anonymous friendly mail provider such as cock.li or riseup.net. You can then configure your hidden service to connect (via Tor) to this mailserver in order to send mail.

As the mailserver is routable on the clearnet, you'll also be able to receive any responses which may be sent (whether via manually logging in, or having your hidden service retrieve mails using IMAP).

Mailproviders such as Google Mail or Outlook.com should be avoided as they have a tendency to require a mobile number for 'verification' when access is made from a known anonymous proxy (such as a Tor exit node).

You will need to be careful about how you access/set up the account however. No matter how anonymous friendly the mail provider may be, you should always assume that everyone has a price at which they'll sacrifice their principles. Never make a direct connection to their services, as it could potentially later be used to identify you.

Benefits:

Relatively quick and easy to set up
Replies can be sent/received
Likelihood of the mail being rejected is reduced

Downsides:

Service Reliance on an unknown third party
Care needs to be taken when accessing/setting up account

Handling User Data in General

When considering what user data you need to collect in order to run your service, it's incredibly important to ensure that things are kept minimal.

The only acceptable way to run a service (whether on the clearnet or Tor) is to assume that at some point you will be compromised, and in fact already may have been.

If an adversary manages to identify you as the administrator, you never want to be in a position of being coerced into releasing data on your users. The only way to ensure that, is to never collect identifying data in the first place. For many services, a username and password is all that you will need to collect.

Any data that is collected, for whatever reason, should be encrypted to ensure that simple vulnerabilities (such as SQL Injection) do not lead to the full-scale release of user data. The decryption keys will need to be available to the service somewhere, so encryption doesn't give much protection against a full server compromise, but you should attempt to mitigate the effects of a compromise at any level.

When deciding whether information is 'identifying', consider the various ways in which an adversary might use the data. Certain data points may not be identifying on their own, but can be correlated against data found elsewhere in order to build a profile of the user involved. Take the following as a simplistic example

Data recovered from you:

Username
Zip Code

Data recovered from another service

Username
House number (without street or Zip)

On their own, neither data set is much use for establishing more than a general vicinity for the user, when combined, however it becomes easy to reduce the number of potential candidates.

You have very little control over what data other services collect, but you can ensure that your systems do not require the user to part with more data than is absolutely necessary. Depending on how serious the consequences might be if an individual was identified as a user of your service, you may wish to encourage users to use a unique username (i.e. not one they've used elsewhere).

Opsec is Essential

Although this module is aimed at protecting user privacy, many of the techniques we've looked at are nominally concerned with protecting the identity of our server.

The reality is, that however innocent (or otherwise) your services may be, as a Hidden Service operator you hold the keys to user data. As we've already discussed, seemingly meaningless data can be correlated against other data in order to establish a profile.

Your users fortunes are therefore, to some small extent, tied to your own. Quite aside from wanting to protect your own anonymity, a failure in Operational Security (OpSec) could allow an adversary to coerce you into revealing data, or making changes in order to collect additional data.

For example, the (now defunct) secure email service - Lavabit - was famously targetted with a National Security Letter. Although the service could not view user emails, the US Government attempted to force the service to make changes and reveal data that would allow interception of these mails to occur. So whilst you can take steps to ensure you collect minimal data, be aware that a failure to protect your own identity could lead you to being coerced to make similar changes.

There has, historically, been an assumption that there are mistakes that nobody would make. However the case of Silk Road's Ross Ulbricht has shown that people do make the simplest of mistakes.

During the Silk Road trial it was revealed that Ross had made a number of severe opsec mistakes, including

Use of the handle 'altoid' to advertise Silk Road on a variety of Forums
Using the handle 'altoid' asked developers to send their resumes to rossulbricht@gmail.com
Asked questions related to SR on Stack Overflow using an account in his own name
Having spotted that mistake, he changed the email on that SO account to frosty@frosty.com
The email frosty@frosty.com was recorded against SSH authorised public keys on the Silk Road Server
Events in Ross' personal life were discussed whilst he was chatting/emailing as Dread Pirates Roberts
Avoidable information leaks (including the IP of the SR server)
Talked publicly (if vaguely) about his intentions to create an 'economic simulation'
Revealed his Time Zone to someone he didn't know
Ordered a large number of fake ID's for delivery to his home address

Some of the above are absolute failures in OpSec, others may appear harmless. Not mixing handles (in this case 'altoid') is a basic tennet of Opsec, and even temporarily linking one to any information which can be used to identify you is a fatal mistake.

He also, very clearly, failed to take the potential of any compromise into account. Having a public key sat on the server is fine, so long as it cannot be used to identify you - Ross was publicly linked to frosty@frosty.com and so this simple mistake made it easy to show that he was an administrator of the system.

Discussing events in your private life when posting under a handle can help an adversary to limit the number of 'suspects' that they have - remember that if you're talking anonymously, there's a very good chance you don't actually know who you're talking to. Similarly, revealing your Time Zone helps narrow a list of suspects, if only marginally.

The leak of information that occurred on Silk Road was avoidable and unforgiveable. For a short while, the Silk Road site displayed not only its own clearnet IP, but the IP address of the VPN endpoint that Ross connected via. Be very, very, careful to ensure that no debugging tools are available on Hidden Services, a simple phpinfo() call may be your undoing!

To practise good OpSec you need to understand that revealing any information, no matter how innocuos it may seem is potentially very dangerous. Breadcrumbs of information can be collected and correlated by an adversary to paint a much clearer picture than you may imagine. Very few (if any) people will be prepared to go to prison just to protect your identity, so never reveal your true identity to anyone, no matter how trustworthy they may seem.

Finally, if a user ever contacts you (for example via email) and makes clear OpSec mistakes, you need to point it out to them - otherwise they'll likely continue making the same mistakes, and your inbox now contains information that could potentially be used to identify them.

Backups

There's obviously very little point in securing your server if you then push backups, in the clear, across the clearnet.

Having off-server backups is obviously desirable, but careful thought needs to be given to exactly how these are implemented.

Push Backups

Backups using a 'push' method are fairly common and popular on the clearnet. Whether pushing up to Amazon's S3, or rsyncing to another server, there are a wide range of solutions available.

However, careful thought needs to be given to where you are pushing to. Should the server be compromised, you need to be able to ensure that examination of the backup method in use won't lead to your identification (as it might if pushing to an S3 account).

You'll also probably want to ensure that your backups are pushed out via Tor to ensure that the backup service cannot identify the server hosting your hidden services (in case unencrypted files are ever accidentally pushed).

Pull Backups

A simple pull mechanism works around a number of the risks involved in using push backups, but does also introduce some new risks.

A basic set up would be

Create a new HTTP(S) hidden service, protected by HTTP Basic or Digest authentication.
The document root for that service would be configured to be the directory that your backup archives are written into.
A script on a system with a Tor client downloads your backup archive at pre-configured intervals.

The advantages of running backups in this way is that neither the backup server, nor the hidden service server need to know the clearnet identity of each other. Your backup 'server' could be something as simple as a Raspberry Pi running on your Home Network, it doesn't need to be directly accessible from the clearnet at all.

However, push backups have one clear advantage over Pull backups - timing. With a push backup, a backup will never be pushed to the backup server until it's complete. Without careful design, this may not be true for Pull backups - if your backup takes longer to run than expected, the pull may result in only part of the backup being pulled off-server.

You need to either design a mechanism to prevent this (such as a simple lock file that the client can check for) or ensure that the scheduled run times are staggered sufficiently. Whatever the mechanism, be sure that it works, otherwise you may find that the first time you know you've been capturing useless backups is when you need them.

Encrypting your Backups

Whatever solution you opt to use, one constant exists - backups should be heavily encrypted as soon as they've been created. A keypair should be generated on another system, and only the public key should be available on the server hosting your Hidden Service. The private key should be stored somewhere very secure. Should you ever need to put the private key onto the server (for example to recover a backup) you should generate a new keypair and cease use of the old pair immediately after.

The type of encryption you use will generally depend on personal preference, but if using a pre-rolled backup script, be sure to carefully examine how it implements encryption.

For example, the s3cmd script can be used to push files to Amazon S3, and includes the ability to PGP encrypt files prior to uploading. When configuring s3cmd, the user is asked to enter a passphrase, and it's easily assumed that this is being used to protect the private key. In reality, however, the passphrase is used to generate a symmetric key which is then used for every single file you upload. When compared to how PGP behaves when using a keypair, this behaviour is undesirable.

A workaround for this (and a better explanation of why it's undesirable) can be found here

Conclusion

Whilst the anonymity of your users is essentially their own concern, you need to ensure you take appropriate steps both to protect their data. You also, no matter how innocent your service, need to practise good OpSec in respect of your own identity to avoid finding yourself being used as a stepping stone to another user.

The anonymity of your server is key to protecting your users identities. The US FBI have already demonstrated that compromising a Hidden Service is a powerful tool when attempting to de-mask otherwise anonymous users (see http://www.computerworld.com/article/2473739/cybercrime-hacking/fbi-behind- firefox-zero-day-compromising-half-of-all-tor-sites-.html)

You cannot, in practice, predict who your adversaries will be, so your threat model needs to include the risk of Government agencies trying to identify you. The Snowden revelations, amongst other news have shown that certain Government agencies can be very indiscriminate when dealing with anonymous sources of information.

Your adversaries will not, however, be limited to Government agencies. Credential Phishing is rife on the darknet, so you need to ensure you implement account protection without compromising user anonymity.

Every change you make to your Hidden Service (or the server it resides on) must be carefully considered, and where possible any reliance on third parties should be removed (for example by implementing your own CAPTCHA's). The more control you retain over the resources loaded by a users browser, the better. A self-sufficient Hidden Service is far, far easier to keep secure than one which relies on third party static resources.

Part 4 - Conclusion