Multi-homing a site between the WWW and an I2P Eepsite

Ben Tasker

2022-03-17 20:42

I recently gave a high-level overview of some of the things I needed to address in the process of multi-homing my site to make bentasker.co.uk available on I2P.

Although I2P presents some new challenges, some of the considerations were the same as when multihoming between Tor and the WWW.

Although I had originally intended to publish a generic multi-homing how-to, it's not really possible because the multi-homing process can be quite site specific. Instead, this post is more of a deep-dive into the process to show some of the things you need to consider when publishing an existing site onto the I2P anonymous overlay network. More than a few of those things will likely improve your www site too.

Parts of this post can also be used to set up a brand new eepsite, though it's assumed you've already got something listening on port 80: this post doesn't go into installing nginx.

As this has proven to be quite a long post, it seems wise to add a table of contents

Initial Set-Up

Setting up an eepsite

It makes sense to start with the simplest part: making an eepsite available on the network.

This will allow us to browse the site and spot issues which need to be addressed. Much like exposing a Tor hidden service, this step is absolutely trivial.

First, we create some configuration to expose an eepsite

mkdir -p i2p_conf/keys
chown 100 i2p_conf/keys
nano i2p_conf/tunnels.conf

Within tunnels.conf we want to add

[my-eepsite]
  type = http
  host = 127.0.0.1
  port = 80
  keys = keys/my-eep.dat
  inbound.length = 1
  outbound.length = 1

This will

Create a http tunnel called my-eepsite
Forward connections on to 127.0.0.1:80 (where we have Nginx/Apache/whatever listening)
Store the generated private key in keys/my-eep.dat
Use and inbound and outbound tunnel length of 1 hop (default is 3)

Tunnel length is an important consideration: the more hops, the greater the latency experienced by visitors. But, the fewer hops, the higher the risk of someone identifying the location of your server (though that risk is still relatively small).

As we're multi-homing a site already published on the world wide web, there's already a range of ways for someone to trivially identify the hosting server. Using longer tunnels in I2P would therefore only really serve to increase latency. Obviously, if the site were I2P only, the consideration would be different.

With our tunnel config in place, we need to get I2P up and running - I use the C++ client (i2pd) for this.

The simplest way to run the client, is with docker

docker run -d \
--restart=always \
--name i2pd \
--net=host \
-v $PWD/tunnels.conf:/home/i2pd/data/tunnels.conf \
-v $PWD/keys:/home/i2pd/data/keys/ \
purplei2p/i2pd --notransit

This will

Create a persistent container called i2pd
Use the host's networking stack (more on why in a second)
Publish our tunnels.conf into the container
Publish our key directory into the container
Run i2pd with transit disabled

The reason we're passing --net=host is because we've configured i2pd to forward onto loopback (i.e. 127.0.0.1). If we don't use --net=host then it'll use the container's loopback interface (which Nginx isn't listening on), so I2P wouldn't be able to reach Nginx.

If your web-stack is also dockerised then you can use docker's networking (with --link or --network) and don't need --net=host.

Binding to the host's networking stack also means that all the ports the container exposes may be publicly available, so we need to ensure those are firewalled off.

for PORT in 4444 4447 2827 7650 7654 7656 7070
do
    iptables -I INPUT -p tcp --dport $PORT ! -s 127.0.0.1 -j REJECT
    ip6tables -I INPUT -p tcp --dport $PORT ! -s ::1 -j REJECT
done

Remember to make those rules persistent (via iptables-save or whatever) if it's something you need to do manually.

When running i2pd I included the notransit option. I did this because I don't want my edge to also be relaying other people's traffic (it impacts some of my monitoring). If you are comfortable relaying packets, then contributing to the network is strongly encouraged (you can, of course, instead run an I2P router somewhere else to offset your use).

Now that i2pd is up and running, if you're interested in stats and the like, you can use Telegraf to monitor I2PD.

Webserver Config

With those few steps done, your eepsite is now published. The next step is to identify its address and configure your webserver to answer to that name.

If you look in keys there should now be a file called my-eep.dat. The eepsite name can be extracted from it by using a utility from i2pd-tools. This toolkit can be installed locally, or, I've dockerised it

docker run --rm \
-v $PWD:/op \
bentasker12/id2pd-tools keyinfo keys/my-eep.dat

This will print out the b32 name (mine is gdncgijky3xvocpkq6xqk5uda4vsnvzuk7ke7jrvxnvyjwkq35iq.b32.i2p).

You'll need to configure your webserver to handle this name, but it may not be as simple as adding it to the config you use for the clearnet.

If you're following best practice, then your clearnet site will be HTTPS only and might even include HTTP Strict Transport Security (HSTS) headers.

The approach I've taken is to use a seperate server block for the eepsite, which proxies through to the main one and can strip things that I don't want returned (it also allows you to bar access to site areas you don't want accessible over I2P)

server {
    listen       localhost:80;
    server_name  gdncgijky3xvocpkq6xqk5uda4vsnvzuk7ke7jrvxnvyjwkq35iq.b32.i2p bentasker.i2p; 
    root /usr/share/nginx/onions/bentaskercouk;

    include /etc/nginx/domains.d/includes/location_block.inc; # WAF blocks
    access_by_lua_file /etc/nginx/domains.d/LUA/WAF_Dynamic_ruleset.lua;


    # Proxy to the back-end
    location / {
        # Set a header so the back-end knows we're coming via the eepsite
        # this isn't actually used anymore, but keeping as futureproofing
        proxy_set_header X-IM-I2P 1;

        # Make sure the host header is correct
        proxy_set_header Host www.bentasker.co.uk;

        # Send the request
        proxy_pass   https://127.0.0.1;

        proxy_hide_header Strict-Transport-Security;
    }

    # Example: block access to /wp-admin.php for I2P users
    location /wp-admin.php {
        return 403;
    }
}

The reason you need to strip HSTS headers (if present) is because they tell the browser it should only connect to your eepsite via HTTPS and only to accept trusted certificates (which you can't obtain for an eepsite). Failing to strip those will break delivery of your eepsite (it'll stay broken for some time too: HSTS is cached by browsers).

With that config live, you should now be able to visit your site from an I2P client.

Setting up an I2P client

You might already have a client set up to test from, if so, you can skip this section.

This is how I set up I2P access using Firefox's Multi-account Containers

I used docker to spin up a copy of i2pd

docker run -d --name=i2pd \
-v i2pd:/home/i2pd/data \
-p 4447:4447 -p 7070:7070 \
--restart=unless-stopped purplei2p/i2pd

You can add --notransit to this too if you need to, but it's a good way to give a small amount of bandwidth back.

I then created a container called i2p and configured it to use the I2P proxy at socks://127.0.0.1:4447

Registering an I2P Shortname

With that, I2P sites are then available within tabs using that container.

Registering a Short name

Much like with Tor hidden services, eepsite names aren't particularly memorable.

Although you can generate a vanity identifier, it's not actually necessary: unlike with Tor, it's possible to register and link a "shortname" (for example bentasker.i2p).

Although it may feel DNS like, the way shortnames work isn't akin to modern DNS (it's much more like the old days where people periodically downloaded updated hosts lists) and it can take a few days for your shortname to become available to all users.

Because of that delay, it's worth registering the name early as it lessens the delay between finishing work and getting everything on line (if you're worried about people visiting the unfinished site, turn on basic auth or similar in your webserver).

Acquiring a short-name is pretty straightforward: you need to create a signed request using i2pd-tools and the eepsite's .dat file.

The command

docker run --rm \
-v $PWD:/op bentasker12/id2pd-tools \
regaddr keys/my-eep.dat myeep.i2p > auth_string.txt

Would generate a signed request for myeep.i2p to "resolve" to the eepsite identifed by my-eep.dat.

Using an I2P client, you should then visit http://reg.i2p/add and put the content of auth_string.txt into the Auth String field

Registering an I2P Shortname

A little while after submitting, you should soon see your domain at http://reg.i2p/latest, though it might take up to 5 days for the entire network to update addressbooks.

Subdomains

You can also register a subdomain (for example snippets.bentasker.i2p) using a similar process - you first need to have created a key for the main domain, as this is used to sign the subdomain request.

We create a new tunnel for the subdomain

[snippets-service]
type = http
host = 127.0.0.1
port = 80
keys = keys/snippets.dat
inbound.length = 1
outbound.length = 1

We then need the keyfile for both this and the parent domain

docker run --rm \
-v $PWD:/op bentasker12/id2pd-tools \
regaddr_3ld step1 keys/snippets.dat snippets.bentasker.i2p > step1.txt

# sign
docker run --rm \
-v $PWD:/op bentasker12/id2pd-tools \
regaddr_3ld step2 step1.txt keys/my-eep.dat bentasker.i2p > step2.txt

# Generate the final request
docker run --rm \
-v $PWD:/op bentasker12/id2pd-tools \
regaddr_3ld step3 step2.txt keys/snippets.dat > step3.txt

The file step3.txt then contains the string for you to submit at http://reg.i2p/add

Adjusting your site for multi-homing

So now we can move onto the main focus of this post: making a www/clearnet site ready for multihoming into I2P.

There are a number of things we need to check and consider.

Speed

I2P can have extremely high latency (the official answer to "how fast is I2P" is "it depends").

Whilst I2P users are, to a certain extent, used to it - one of the best things you can do to improve performance is to review what you're sending over the wire (in many ways, we're back to what used to be web-design 101 here)

For anything that's being sent over the network, ask

Does it need to be sent?
Can it be compressed/minified?
Can it be deferred until after page load?
Is it being sent in a cacheable manner?

The latter is quite important. It's not uncommon for pages to include inline scripts or styles:

<html>
<head>
<style type="text/css">
   .foo {color: red}
</style>
</head>
<body>
   <div class="foo" onclick="window.alert('bar')">Lorem Ipsum etc etc</div>
</body>
</html>

If these inline elements are present in more than a few pages, then there are bytes hitting the wire that didn't need to.

Move those inlines to a seperate file

<html>
<head>
<link rel="stylesheet" href="style.css" />
</head>
<body>
   <div class="foo" data-text='bar'>Lorem Ipsum etc etc</div>
</body>
<script type="text/javascript" src="alerts.js"></script>
</html>

This incurs a couple of additional HTTP requests, but only on the first page view. For each subsequent page view those bytes come from the browser's cache and never hit the wire.

You should apply this analytical mindset to everything you're hosting.

If you're hosting video, is it in an adaptive format? Assuming you're using HLS have you ensured that the lowest bandwidth variant is first to appear in the master manifest (most players start with the first listed and then switch based on calculated bandwidth)? If not, then you may inadvertantly tie the user's tunnel up fetching initial chunks, preventing more urgent content (like stylesheets) from loading.

Could you perhaps add loading="lazy" to images that are likely to be below the fold?

Anything which reduces the number of initial requests can help ensure that the page loads faster and will benefit the clearnet version of your site too.

External Resources

The next thing you need to consider is whether your site uses any external resources.

By default, I2P is a closed circuit: there's no access to clearnet domains. Some users use an Outproxy or local proxy rules in order to work around this, but if you want your eepsite to work consistently it's best to assume that most users won't have done this.

You need to identify what external resources your site relies upon, and what the consequences are if each of those resources doesn't load.

For example, if we take the following HTML

<head>
    <link rel="stylesheet" href="https://mycdndomain.example.com/styles/style.css" >
    <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Tangerine">
</head>

One is more severe than the other.

If your main stylesheet isn't accessible, then your site will have no styling at all and probably look horrendous.
If the font CSS is unavailable then the browser will fall back to whatever fonts it's got available - it may not be the font you want, but the site will still be usable.

The former almost certainly needs fixing, whilst you might consider the latter an acceptable trade off. You might choose to locally host the stylesheet so that you can ultimately update that markup to be

<head>
    <link rel="stylesheet" href="/styles/style.css" >
    <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Tangerine">
</head>

Dynamically updating external references

It's not just stylesheets that you need to consider, it's all external resources: images, script, videos etc.

One of the things that I needed to find a solution for was the videos section of my site.

The video embed script (embed.min.js) and the content itself is hosted at videos.bentasker.co.uk. So, without external access, any part of www.bentasker.co.uk that relied on embedded video section was completely pointless.

To address this, I decided to also dual-home videos.bentasker.co.uk onto I2P, so that I could serve videos via eepsite.

If my site were dynamic, it'd have been relatively easy to dynamically rewrite references (this is why my nginx config includes a custom upstream header, as that's exactly what I did when multi-homing onto Tor). Note, though, that if you are making backend changes to a dynamic site, you'll want to be conscious of the risk of cache-poisoning.

However, my site is static: I needed to implement something client side, which meant adding supporting javascript to the site itself. Simply making changes in embed/min.js wouldn't be sufficient, because I2P users can't fetch that resource in the first place.

With a small handful of known resources, it's possible to take a fairly unsophisticated approach

function Clearnet2I2P(){
    if (window.location.hostname.split(".").pop().toLowerCase() != "i2p"){
        /* Nothing to do */
        return;
    }

    // Embed the video script
    s = document.createElement('script');
    s.setAttribute('src', 'http://bapmqkdc7xotvlym3bj75gdb4tlgg2poezkmz36w64qum4racpyq.b32.i2p/resources/embed/embed.min.js');
    s.addEventListener('load',function(){embedBensPlayerDivs()});
    document.body.appendChild(s);        

    // Embed the analytics agent
    s2 = document.createElement('script');
    s2.setAttribute('src', 'http://5es4aj6pfdxoz6oz6vbcczix25dlfelrdav6a6hw7tuudb7kxwba.b32.i2p/agent.js');
    document.body.appendChild(s2);        
}

However, sometimes assets need to be replaced in place so that ordering is preserved, meaning something more complex is required

function Clearnet2I2P(){
    if (window.location.hostname.split(".").pop().toLowerCase() != "i2p"){
        /* Nothing to do */
        return;
    }

    var i, src, dom, newurl, newele;
    let buff_obj = {
        "buf" : [],
        "video_found" : false,
        "mappings" : {
            "pfanalytics.bentasker.co.uk" : "5es4aj6pfdxoz6oz6vbcczix25dlfelrdav6a6hw7tuudb7kxwba.b32.i2p",
            "videos.bentasker.co.uk" : "bapmqkdc7xotvlym3bj75gdb4tlgg2poezkmz36w64qum4racpyq.b32.i2p",
            "static1.bentasker.co.uk" : "http://gdncgijky3xvocpkq6xqk5uda4vsnvzuk7ke7jrvxnvyjwkq35iq.b32.i2p"        
        }
    }

    adjustElements("script", "src", buff_obj);
    adjustElements("img", "src", buff_obj);
    adjustElements("link", "href", buff_obj);

    // Process anything we've found
    for (i=0; i<buff_obj["buf"].length; i++){
        buff_obj["buf"][i][1].parentNode.insertBefore(buff_obj["buf"][i][0], buff_obj["buf"][i][1]);
    }

    if (buff_obj["video_found"]){
        // We want to re-detect videos once we know the script has loaded
        s = document.createElement('script');
        s.setAttribute('src', 'http://bapmqkdc7xotvlym3bj75gdb4tlgg2poezkmz36w64qum4racpyq.b32.i2p/resources/embed/embed.min.js');
        s.addEventListener('load',function(){embedBensPlayerDivs()});
        document.body.appendChild(s);
    }

}

function adjustElements(tagname, attrib, buffer_obj){

    var eles = document.getElementsByTagName(tagname);

    for (i=0; i<eles.length; i++){
        src = eles[i].getAttribute(attrib);
        if (!src || src.substring(0,4) != "http"){
            // Relative or empty link, skip
            continue;
        }
        console.log(src);
        dom = src.split("/")[2].toLowerCase();

        if (!buffer_obj["video_found"] && dom.includes("videos.bentasker.co.uk")){
            buffer_obj["video_found"] = true;
        }

        // Do we have a mapping for that domain?
        if (buffer_obj["mappings"][dom]){
            newurl = src.replace("://"+dom, "://" + buffer_obj["mappings"][dom]).replace("https://","http://");

            // Clone rather than updating existing - the DOM doesn't always reliably update with a simple source change
            newele = eles[i].cloneNode();
            newele.setAttribute(attrib, newurl);
            eles[i].setAttribute(style, 'display: none');

            // Push to a buffer - the DOM will be updated later
            //
            // This helps avert an infinite loop
            buffer_obj["buf"].push([newele, eles[i]]);
        }
    }
}

Clearnet2I2P();

This will clone any script, stylesheet or image references that we have a known eepsite for and use the I2P url.

This isn't an ideal solution if used on its own: users with javascript disabled won't get the fixed URLs, that's fine for scripts (which wouldn't have run anyway) but not so fine for images and stylesheets.

External Scripts

Rewriting javascript references after the fact also often isn't enough.

For example, earlier versions of my video embed script used the following approach to embedding

<script type="text/javascript" 
        src="https://videos.bentasker.co.uk/resources/embed/embed.min.js"></script>
<script type="text/javascript">
    embedBensPlayer('2017/201705_Lua_split_string/lua_string_split.mp4_master.m3u8');
</script>

So, when loading the eepsite, we'd see the following in the javascript console

Loading failed for the <script> with source “https://videos.bentasker.co.uk/resources/embed/embed.min.js”.
Uncaught ReferenceError: embedBensPlayer is not defined

When the rewrite kicked in, we'd succesfully load embed.min.js but the player wouldn't load because there was nothing attempting to re-execute embedBensPlayer().

In v0.19 I added support for a new approach

    <script type="text/javascript" 
            src="https://videos.bentasker.co.uk/resources/embed/embed.min.js"></script>
    <div class="embedBensPlayer" data-src='2017/201705_Lua_split_string/lua_string_split.mp4_master.m3u8'></div>

When we hit DOM Ready, the embed script looks for all elements with class embedBensPlayer and... well... embeds my player in them.

So now, when we embed the I2P served version we can also re-trigger the method that performs the embedding

s = document.createElement('script');
s.setAttribute('src', 'http://bapmqkdc7xotvlym3bj75gdb4tlgg2poezkmz36w64qum4racpyq.b32.i2p/resources/embed/embed.min.js');
s.addEventListener('load',function(){embedBensPlayerDivs()});
document.body.appendChild(s);

Although it was a bit of a pain to have to switch existing embeds over, it's generally good practice to avoid inline javascript (and document.write()), so this carries benefits beyond I2P (including being a step toward being able to enable a meaningful Content Security Policy).

XHR Requests and CORS

On most pages, my site makes an AJAX/xmlhttp request to snippets.bentasker.co.uk in order to fetch the JSON sitemap, run a client-side search, and list related snippets:

It was a pretty inoffensive bit of javascript and should have just needed updating to use the eepsite for snippets.

function triggerRelatedSnippets(){
    var url;
    var tld = window.location.hostname.split(".").pop().toLowerCase();
    if(tld == "i2p") {
        url = "http://vgduvgxudaceslvwlvtda6b4csobvczygcqpklm3yeuke2zgvcaa.b32.i2p/sitemap.json";
    } else {
        url = "https://snippets.bentasker.co.uk/sitemap.json";
    }

    // Trigger the fetch
    fetchPage(url, writeResult, errorResult);
}

But, the JSON was failing to load.

When you access www.bentasker.co.uk the request is going to a sub-domain of the site you're on, so it's not considered a cross-origin request.

When using I2P though, the request is going to an entirely different domain and so is cross-origin: CORS headers need to be added.

Of course, had I set the URL to http://snippets.bentasker.i2p then, for visitors to bentasker.i2p, the module would work (but not for visitors to gdncgijky3xvocpkq6xqk5uda4vsnvzuk7ke7jrvxnvyjwkq35iq.b32.i2p).

I wanted maximum compatability though, so added the headers (this was also breaking the feature on the Tor hidden service - I just hadn't noticed until now).

Adding CORS headers in Nginx is fairly trivial

location = /sitemap.json {
        add_header Allow "GET, HEAD" always;
        if ( $request_method !~ ^(GET|HEAD|OPTIONS)$ ) {
                return 405;
        }

        if ($request_method = 'OPTIONS') {
            add_header 'Access-Control-Allow-Origin' '*';
            add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
            add_header 'Access-Control-Allow-Headers' 'DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range';
            add_header 'Access-Control-Max-Age' 1728000;
            add_header 'Content-Type' 'text/plain; charset=utf-8';
            add_header 'Content-Length' 0;
            return 204;
        }                

        add_header 'Access-Control-Allow-Origin' '*' always;
        add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS' always;                

        include /etc/nginx/conf.d/snippets_proxy.inc;       
}

I set that live, and suddenly the whole flow worked again!

Other Considerations

There are some other bits that either didn't affect me, but I still needed to check for.

Secure only cookies

When setting cookies, it's possible to define that they should only be sent over a HTTPS connection by including the Secure attribute

Set-Cookie foo=bar; Secure

If set, this would mean that cookies set by the eepsite wouldn't then be sent back to the eepsite by the browser, so if the backend relies on the presence of these cookies (perhaps because they contain a session identifier) you'd run into issues.

If needed it can be addressed by having the Nginx reverse proxy strip the Secure attribute:

proxy_cookie_flags ~ nosecure;

Browser Feature Gating

Web browsers are, unsurprisingly, focused on the world wide web, where there's a massive drive to get everyone using HTTPS (and for good reason).

As part of that, there are various browser features and APIs which are only made available to sites using HTTPS.

I'm not currently using any of those, but it may be problematic for any who intend to.

The Web Authentication API (aka WebAuthn) is an unfortunate casualty of that, with the nasty side-effect of making it that much harder for eepsites (or indeed Tor hidden services) to implement meaningful Two Factor Authentication.

Similarly, it also means no http/2 for eepsites: although the spec allows http2 via HTTP, no browsers have implemented support for it. This means that features like server push aren't available (though no-one really used it anyway).

Reliance on IP Blocking

There are a number of Web Application Firewalls (WAFs) which block badly behaved IPs.

When enabled for an eepsite, these pose a DoS Risk.

Arguably, they're potentially harmful on the www too: the reality of the modern internet is that a lot of users sit behind carrier-grade NAT, so when your WAF blocks a "bad" IP, it might actually be blocking a significant pool of users.

Reliance on IP reputation/behaviour can also lead to a false sense of security, to the cost of behaviour detection/analysis. For example, if your WAF doesn't catch a specific exploit technique and your attacker tries that before reaching the bad-behaviour threshold, then the attempt will be allowed through.

Essentially, you add complexity and increased risk of overblocking for relatively little (but non-0) gain.

But, that's a tangent.

If you're running a WAF, you need to ensure that it can never block 127.0.0.1 otherwise your eepsite will have an outage (it's also possible to build a custom hybrid solution as I did for Tor, but with the benefit of hindsight, it's really not worth the effort).

Sessions tied to IP

It shouldn't be a thing on the modern web, but somewhere out there will be a site running ancient code just waiting to prove me wrong.

If session persistence is important to your application, you need to ensure that sessions are linked to a token that the client provides, rather than simply being derived from IP (as all of your users will have the same IP, and thus the same session).

Conclusion

This post is, perhaps, intimidatingly long. But, that's because it takes longer to write about some of these thigns than it does to check them. Multi-homing a site onto I2P is actually relatively straightforward, it's just a case of looking at what your site does and identifying potential problems.

A number of the items listed here are things that are well worth discovering anyway, as they can impact future www side development and performance.

If you're able to work your site to the point that you can multi-home it onto different networks, then you'll also have enabled yourself to more easily do things like

Switch hosting
Switch CDN provider
Build redundancy
Launch new functionality

By publishing your site into I2P (and Tor) you're offering your users a strongly authenticated, available and private route to access your services.

When a user connects to an eepsite, they're guaranteed to either connect to your server or fail to connect at all: they won't end up inadvertently connected to (or via) someone else. That means no man-in-the-middle by a network level censor, no DNS poisoning (or censorship) by a malicious actor and no meaningful tracking of their traffic.

This may allow users in oppressive countries better access to information (all too important at the moment), or it might simply allow someone to browse your content comfortable in the knowledge that their ISP isn't going to sell their browsing history to advertisers.

Everyone (well, maybe not advertisers) is a winner.