Multi-homing a site between the WWW and an I2P Eepsite
I recently gave a high-level overview of some of the things I needed to address in the process of multi-homing my site to make bentasker.co.uk available on I2P.
Although I2P presents some new challenges, some of the considerations were the same as when multihoming between Tor and the WWW.
Although I had originally intended to publish a generic multi-homing how-to, it's not really possible because the multi-homing process can be quite site specific. Instead, this post is more of a deep-dive into the process to show some of the things you need to consider when publishing an existing site onto the I2P anonymous overlay network. More than a few of those things will likely improve your www
site too.
Parts of this post can also be used to set up a brand new eepsite, though it's assumed you've already got something listening on port 80: this post doesn't go into installing nginx.
Contents
As this has proven to be quite a long post, it seems wise to add a table of contents
Initial Set-Up
Setting up an eepsite
It makes sense to start with the simplest part: making an eepsite available on the network.
This will allow us to browse the site and spot issues which need to be addressed. Much like exposing a Tor hidden service, this step is absolutely trivial.
First, we create some configuration to expose an eepsite
mkdir -p i2p_conf/keys
chown 100 i2p_conf/keys
nano i2p_conf/tunnels.conf
Within tunnels.conf
we want to add
[my-eepsite]
type = http
host = 127.0.0.1
port = 80
keys = keys/my-eep.dat
inbound.length = 1
outbound.length = 1
This will
- Create a
http
tunnel calledmy-eepsite
- Forward connections on to
127.0.0.1:80
(where we have Nginx/Apache/whatever listening) - Store the generated private key in
keys/my-eep.dat
- Use and inbound and outbound tunnel length of 1 hop (default is 3)
Tunnel length is an important consideration: the more hops, the greater the latency experienced by visitors. But, the fewer hops, the higher the risk of someone identifying the location of your server (though that risk is still relatively small).
As we're multi-homing a site already published on the world wide web, there's already a range of ways for someone to trivially identify the hosting server. Using longer tunnels in I2P would therefore only really serve to increase latency. Obviously, if the site were I2P only, the consideration would be different.
With our tunnel config in place, we need to get I2P up and running - I use the C++ client (i2pd
) for this.
The simplest way to run the client, is with docker
docker run -d \
--restart=always \
--name i2pd \
--net=host \
-v $PWD/tunnels.conf:/home/i2pd/data/tunnels.conf \
-v $PWD/keys:/home/i2pd/data/keys/ \
purplei2p/i2pd --notransit
This will
- Create a persistent container called
i2pd
- Use the host's networking stack (more on why in a second)
- Publish our
tunnels.conf
into the container - Publish our key directory into the container
- Run
i2pd
with transit disabled
The reason we're passing --net=host
is because we've configured i2pd
to forward onto loopback (i.e. 127.0.0.1
). If we don't use --net=host
then it'll use the container's loopback interface (which Nginx isn't listening on), so I2P wouldn't be able to reach Nginx.
If your web-stack is also dockerised then you can use docker's networking (with --link
or --network
) and don't need --net=host
.
Binding to the host's networking stack also means that all the ports the container exposes may be publicly available, so we need to ensure those are firewalled off.
for PORT in 4444 4447 2827 7650 7654 7656 7070
do
iptables -I INPUT -p tcp --dport $PORT ! -s 127.0.0.1 -j REJECT
ip6tables -I INPUT -p tcp --dport $PORT ! -s ::1 -j REJECT
done
Remember to make those rules persistent (via iptables-save
or whatever) if it's something you need to do manually.
When running i2pd
I included the notransit
option. I did this because I don't want my edge to also be relaying other people's traffic (it impacts some of my monitoring). If you are comfortable relaying packets, then contributing to the network is strongly encouraged (you can, of course, instead run an I2P router somewhere else to offset your use).
Now that i2pd
is up and running, if you're interested in stats and the like, you can use Telegraf to monitor I2PD.
Webserver Config
With those few steps done, your eepsite is now published. The next step is to identify its address and configure your webserver to answer to that name.
If you look in keys
there should now be a file called my-eep.dat
. The eepsite name can be extracted from it by using a utility from i2pd-tools
. This toolkit can be installed locally, or, I've dockerised it
docker run --rm \
-v $PWD:/op \
bentasker12/id2pd-tools keyinfo keys/my-eep.dat
This will print out the b32 name (mine is gdncgijky3xvocpkq6xqk5uda4vsnvzuk7ke7jrvxnvyjwkq35iq.b32.i2p
).
You'll need to configure your webserver to handle this name, but it may not be as simple as adding it to the config you use for the clearnet.
If you're following best practice, then your clearnet site will be HTTPS only and might even include HTTP Strict Transport Security (HSTS) headers.
The approach I've taken is to use a seperate server block for the eepsite, which proxies through to the main one and can strip things that I don't want returned (it also allows you to bar access to site areas you don't want accessible over I2P)
server {
listen localhost:80;
server_name gdncgijky3xvocpkq6xqk5uda4vsnvzuk7ke7jrvxnvyjwkq35iq.b32.i2p bentasker.i2p;
root /usr/share/nginx/onions/bentaskercouk;
include /etc/nginx/domains.d/includes/location_block.inc; # WAF blocks
access_by_lua_file /etc/nginx/domains.d/LUA/WAF_Dynamic_ruleset.lua;
# Proxy to the back-end
location / {
# Set a header so the back-end knows we're coming via the eepsite
# this isn't actually used anymore, but keeping as futureproofing
proxy_set_header X-IM-I2P 1;
# Make sure the host header is correct
proxy_set_header Host www.bentasker.co.uk;
# Send the request
proxy_pass https://127.0.0.1;
proxy_hide_header Strict-Transport-Security;
}
# Example: block access to /wp-admin.php for I2P users
location /wp-admin.php {
return 403;
}
}
The reason you need to strip HSTS headers (if present) is because they tell the browser it should only connect to your eepsite via HTTPS and only to accept trusted certificates (which you can't obtain for an eepsite). Failing to strip those will break delivery of your eepsite (it'll stay broken for some time too: HSTS is cached by browsers).
With that config live, you should now be able to visit your site from an I2P client.
Setting up an I2P client
You might already have a client set up to test from, if so, you can skip this section.
This is how I set up I2P access using Firefox's Multi-account Containers
I used docker
to spin up a copy of i2pd
docker run -d --name=i2pd \
-v i2pd:/home/i2pd/data \
-p 4447:4447 -p 7070:7070 \
--restart=unless-stopped purplei2p/i2pd
You can add --notransit
to this too if you need to, but it's a good way to give a small amount of bandwidth back.
I then created a container called i2p
and configured it to use the I2P proxy at socks://127.0.0.1:4447
With that, I2P sites are then available within tabs using that container.
Registering a Short name
Much like with Tor hidden services, eepsite names aren't particularly memorable.
Although you can generate a vanity identifier, it's not actually necessary: unlike with Tor, it's possible to register and link a "shortname" (for example bentasker.i2p
).
Although it may feel DNS like, the way shortnames work isn't akin to modern DNS (it's much more like the old days where people periodically downloaded updated hosts lists) and it can take a few days for your shortname to become available to all users.
Because of that delay, it's worth registering the name early as it lessens the delay between finishing work and getting everything on line (if you're worried about people visiting the unfinished site, turn on basic auth or similar in your webserver).
Acquiring a short-name is pretty straightforward: you need to create a signed request using i2pd-tools
and the eepsite's .dat
file.
The command
docker run --rm \
-v $PWD:/op bentasker12/id2pd-tools \
regaddr keys/my-eep.dat myeep.i2p > auth_string.txt
Would generate a signed request for myeep.i2p
to "resolve" to the eepsite identifed by my-eep.dat
.
Using an I2P client, you should then visit http://reg.i2p/add
and put the content of auth_string.txt
into the Auth String
field
A little while after submitting, you should soon see your domain at http://reg.i2p/latest
, though it might take up to 5 days for the entire network to update addressbooks.
Subdomains
You can also register a subdomain (for example snippets.bentasker.i2p
) using a similar process - you first need to have created a key for the main domain, as this is used to sign the subdomain request.
We create a new tunnel for the subdomain
[snippets-service]
type = http
host = 127.0.0.1
port = 80
keys = keys/snippets.dat
inbound.length = 1
outbound.length = 1
We then need the keyfile for both this and the parent domain
docker run --rm \
-v $PWD:/op bentasker12/id2pd-tools \
regaddr_3ld step1 keys/snippets.dat snippets.bentasker.i2p > step1.txt
# sign
docker run --rm \
-v $PWD:/op bentasker12/id2pd-tools \
regaddr_3ld step2 step1.txt keys/my-eep.dat bentasker.i2p > step2.txt
# Generate the final request
docker run --rm \
-v $PWD:/op bentasker12/id2pd-tools \
regaddr_3ld step3 step2.txt keys/snippets.dat > step3.txt
The file step3.txt
then contains the string for you to submit at http://reg.i2p/add
Adjusting your site for multi-homing
So now we can move onto the main focus of this post: making a www/clearnet site ready for multihoming into I2P.
There are a number of things we need to check and consider.
Speed
I2P can have extremely high latency (the official answer to "how fast is I2P" is "it depends").
Whilst I2P users are, to a certain extent, used to it - one of the best things you can do to improve performance is to review what you're sending over the wire (in many ways, we're back to what used to be web-design 101 here)
For anything that's being sent over the network, ask
- Does it need to be sent?
- Can it be compressed/minified?
- Can it be deferred until after page load?
- Is it being sent in a cacheable manner?
The latter is quite important. It's not uncommon for pages to include inline scripts or styles:
<html>
<head>
<style type="text/css">
.foo {color: red}
</style>
</head>
<body>
<div class="foo" onclick="window.alert('bar')">Lorem Ipsum etc etc</div>
</body>
</html>
If these inline elements are present in more than a few pages, then there are bytes hitting the wire that didn't need to.
Move those inlines to a seperate file
<html>
<head>
<link rel="stylesheet" href="style.css" />
</head>
<body>
<div class="foo" data-text='bar'>Lorem Ipsum etc etc</div>
</body>
<script type="text/javascript" src="alerts.js"></script>
</html>
This incurs a couple of additional HTTP requests, but only on the first page view. For each subsequent page view those bytes come from the browser's cache and never hit the wire.
You should apply this analytical mindset to everything you're hosting.
If you're hosting video, is it in an adaptive format? Assuming you're using HLS have you ensured that the lowest bandwidth variant is first to appear in the master manifest (most players start with the first listed and then switch based on calculated bandwidth)? If not, then you may inadvertantly tie the user's tunnel up fetching initial chunks, preventing more urgent content (like stylesheets) from loading.
Could you perhaps add loading="lazy"
to images that are likely to be below the fold?
Anything which reduces the number of initial requests can help ensure that the page loads faster and will benefit the clearnet version of your site too.
External Resources
The next thing you need to consider is whether your site uses any external resources.
By default, I2P is a closed circuit: there's no access to clearnet domains. Some users use an Outproxy or local proxy rules in order to work around this, but if you want your eepsite to work consistently it's best to assume that most users won't have done this.
You need to identify what external resources your site relies upon, and what the consequences are if each of those resources doesn't load.
For example, if we take the following HTML
<head>
<link rel="stylesheet" href="https://mycdndomain.example.com/styles/style.css" >
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Tangerine">
</head>
One is more severe than the other.
- If your main stylesheet isn't accessible, then your site will have no styling at all and probably look horrendous.
- If the font CSS is unavailable then the browser will fall back to whatever fonts it's got available - it may not be the font you want, but the site will still be usable.
The former almost certainly needs fixing, whilst you might consider the latter an acceptable trade off. You might choose to locally host the stylesheet so that you can ultimately update that markup to be
<head>
<link rel="stylesheet" href="/styles/style.css" >
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Tangerine">
</head>
Dynamically updating external references
It's not just stylesheets that you need to consider, it's all external resources: images, script, videos etc.
One of the things that I needed to find a solution for was the videos section of my site.
The video embed script (embed.min.js
) and the content itself is hosted at videos.bentasker.co.uk
. So, without external access, any part of www.bentasker.co.uk
that relied on embedded video section was completely pointless.
To address this, I decided to also dual-home videos.bentasker.co.uk
onto I2P, so that I could serve videos via eepsite.
If my site were dynamic, it'd have been relatively easy to dynamically rewrite references (this is why my nginx config includes a custom upstream header, as that's exactly what I did when multi-homing onto Tor). Note, though, that if you are making backend changes to a dynamic site, you'll want to be conscious of the risk of cache-poisoning.
However, my site is static: I needed to implement something client side, which meant adding supporting javascript to the site itself. Simply making changes in embed/min.js
wouldn't be sufficient, because I2P users can't fetch that resource in the first place.
With a small handful of known resources, it's possible to take a fairly unsophisticated approach
function Clearnet2I2P(){
if (window.location.hostname.split(".").pop().toLowerCase() != "i2p"){
/* Nothing to do */
return;
}
// Embed the video script
s = document.createElement('script');
s.setAttribute('src', 'http://bapmqkdc7xotvlym3bj75gdb4tlgg2poezkmz36w64qum4racpyq.b32.i2p/resources/embed/embed.min.js');
s.addEventListener('load',function(){embedBensPlayerDivs()});
document.body.appendChild(s);
// Embed the analytics agent
s2 = document.createElement('script');
s2.setAttribute('src', 'http://5es4aj6pfdxoz6oz6vbcczix25dlfelrdav6a6hw7tuudb7kxwba.b32.i2p/agent.js');
document.body.appendChild(s2);
}
However, sometimes assets need to be replaced in place so that ordering is preserved, meaning something more complex is required
function Clearnet2I2P(){
if (window.location.hostname.split(".").pop().toLowerCase() != "i2p"){
/* Nothing to do */
return;
}
var i, src, dom, newurl, newele;
let buff_obj = {
"buf" : [],
"video_found" : false,
"mappings" : {
"pfanalytics.bentasker.co.uk" : "5es4aj6pfdxoz6oz6vbcczix25dlfelrdav6a6hw7tuudb7kxwba.b32.i2p",
"videos.bentasker.co.uk" : "bapmqkdc7xotvlym3bj75gdb4tlgg2poezkmz36w64qum4racpyq.b32.i2p",
"static1.bentasker.co.uk" : "http://gdncgijky3xvocpkq6xqk5uda4vsnvzuk7ke7jrvxnvyjwkq35iq.b32.i2p"
}
}
adjustElements("script", "src", buff_obj);
adjustElements("img", "src", buff_obj);
adjustElements("link", "href", buff_obj);
// Process anything we've found
for (i=0; i<buff_obj["buf"].length; i++){
buff_obj["buf"][i][1].parentNode.insertBefore(buff_obj["buf"][i][0], buff_obj["buf"][i][1]);
}
if (buff_obj["video_found"]){
// We want to re-detect videos once we know the script has loaded
s = document.createElement('script');
s.setAttribute('src', 'http://bapmqkdc7xotvlym3bj75gdb4tlgg2poezkmz36w64qum4racpyq.b32.i2p/resources/embed/embed.min.js');
s.addEventListener('load',function(){embedBensPlayerDivs()});
document.body.appendChild(s);
}
}
function adjustElements(tagname, attrib, buffer_obj){
var eles = document.getElementsByTagName(tagname);
for (i=0; i<eles.length; i++){
src = eles[i].getAttribute(attrib);
if (!src || src.substring(0,4) != "http"){
// Relative or empty link, skip
continue;
}
console.log(src);
dom = src.split("/")[2].toLowerCase();
if (!buffer_obj["video_found"] && dom.includes("videos.bentasker.co.uk")){
buffer_obj["video_found"] = true;
}
// Do we have a mapping for that domain?
if (buffer_obj["mappings"][dom]){
newurl = src.replace("://"+dom, "://" + buffer_obj["mappings"][dom]).replace("https://","http://");
// Clone rather than updating existing - the DOM doesn't always reliably update with a simple source change
newele = eles[i].cloneNode();
newele.setAttribute(attrib, newurl);
eles[i].setAttribute(style, 'display: none');
// Push to a buffer - the DOM will be updated later
//
// This helps avert an infinite loop
buffer_obj["buf"].push([newele, eles[i]]);
}
}
}
Clearnet2I2P();
This will clone any script, stylesheet or image references that we have a known eepsite for and use the I2P url.
This isn't an ideal solution if used on its own: users with javascript disabled won't get the fixed URLs, that's fine for scripts (which wouldn't have run anyway) but not so fine for images and stylesheets.
External Scripts
Rewriting javascript references after the fact also often isn't enough.
For example, earlier versions of my video embed script used the following approach to embedding
<script type="text/javascript"
src="https://videos.bentasker.co.uk/resources/embed/embed.min.js"></script>
<script type="text/javascript">
embedBensPlayer('2017/201705_Lua_split_string/lua_string_split.mp4_master.m3u8');
</script>
So, when loading the eepsite, we'd see the following in the javascript console
Loading failed for the <script> with source “https://videos.bentasker.co.uk/resources/embed/embed.min.js”.
Uncaught ReferenceError: embedBensPlayer is not defined
When the rewrite kicked in, we'd succesfully load embed.min.js
but the player wouldn't load because there was nothing attempting to re-execute embedBensPlayer()
.
In v0.19 I added support for a new approach
<script type="text/javascript"
src="https://videos.bentasker.co.uk/resources/embed/embed.min.js"></script>
<div class="embedBensPlayer" data-src='2017/201705_Lua_split_string/lua_string_split.mp4_master.m3u8'></div>
When we hit DOM Ready, the embed script looks for all elements with class embedBensPlayer
and... well... embeds my player in them.
So now, when we embed the I2P served version we can also re-trigger the method that performs the embedding
s = document.createElement('script');
s.setAttribute('src', 'http://bapmqkdc7xotvlym3bj75gdb4tlgg2poezkmz36w64qum4racpyq.b32.i2p/resources/embed/embed.min.js');
s.addEventListener('load',function(){embedBensPlayerDivs()});
document.body.appendChild(s);
Although it was a bit of a pain to have to switch existing embeds over, it's generally good practice to avoid inline javascript (and document.write()
), so this carries benefits beyond I2P (including being a step toward being able to enable a meaningful Content Security Policy).
XHR Requests and CORS
On most pages, my site makes an AJAX/xmlhttp
request to snippets.bentasker.co.uk
in order to fetch the JSON sitemap, run a client-side search, and list related snippets:
It was a pretty inoffensive bit of javascript and should have just needed updating to use the eepsite for snippets.
function triggerRelatedSnippets(){
var url;
var tld = window.location.hostname.split(".").pop().toLowerCase();
if(tld == "i2p") {
url = "http://vgduvgxudaceslvwlvtda6b4csobvczygcqpklm3yeuke2zgvcaa.b32.i2p/sitemap.json";
} else {
url = "https://snippets.bentasker.co.uk/sitemap.json";
}
// Trigger the fetch
fetchPage(url, writeResult, errorResult);
}
But, the JSON was failing to load.
When you access www.bentasker.co.uk
the request is going to a sub-domain of the site you're on, so it's not considered a cross-origin request.
When using I2P though, the request is going to an entirely different domain and so is cross-origin: CORS headers need to be added.
Of course, had I set the URL to http://snippets.bentasker.i2p
then, for visitors to bentasker.i2p
, the module would work (but not for visitors to gdncgijky3xvocpkq6xqk5uda4vsnvzuk7ke7jrvxnvyjwkq35iq.b32.i2p
).
I wanted maximum compatability though, so added the headers (this was also breaking the feature on the Tor hidden service - I just hadn't noticed until now).
Adding CORS headers in Nginx is fairly trivial
location = /sitemap.json {
add_header Allow "GET, HEAD" always;
if ( $request_method !~ ^(GET|HEAD|OPTIONS)$ ) {
return 405;
}
if ($request_method = 'OPTIONS') {
add_header 'Access-Control-Allow-Origin' '*';
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
add_header 'Access-Control-Allow-Headers' 'DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range';
add_header 'Access-Control-Max-Age' 1728000;
add_header 'Content-Type' 'text/plain; charset=utf-8';
add_header 'Content-Length' 0;
return 204;
}
add_header 'Access-Control-Allow-Origin' '*' always;
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS' always;
include /etc/nginx/conf.d/snippets_proxy.inc;
}
I set that live, and suddenly the whole flow worked again!
Other Considerations
There are some other bits that either didn't affect me, but I still needed to check for.
Secure only cookies
When setting cookies, it's possible to define that they should only be sent over a HTTPS connection by including the Secure
attribute
Set-Cookie foo=bar; Secure
If set, this would mean that cookies set by the eepsite wouldn't then be sent back to the eepsite by the browser, so if the backend relies on the presence of these cookies (perhaps because they contain a session identifier) you'd run into issues.
If needed it can be addressed by having the Nginx reverse proxy strip the Secure
attribute:
proxy_cookie_flags ~ nosecure;
Browser Feature Gating
Web browsers are, unsurprisingly, focused on the world wide web, where there's a massive drive to get everyone using HTTPS (and for good reason).
As part of that, there are various browser features and APIs which are only made available to sites using HTTPS.
I'm not currently using any of those, but it may be problematic for any who intend to.
The Web Authentication API
(aka WebAuthn
) is an unfortunate casualty of that, with the nasty side-effect of making it that much harder for eepsites (or indeed Tor hidden services) to implement meaningful Two Factor Authentication.
Similarly, it also means no http/2
for eepsites: although the spec allows http2
via HTTP, no browsers have implemented support for it. This means that features like server push aren't available (though no-one really used it anyway).
Reliance on IP Blocking
There are a number of Web Application Firewalls (WAFs) which block badly behaved IPs.
When enabled for an eepsite, these pose a DoS Risk.
Arguably, they're potentially harmful on the www
too: the reality of the modern internet is that a lot of users sit behind carrier-grade NAT, so when your WAF blocks a "bad" IP, it might actually be blocking a significant pool of users.
Reliance on IP reputation/behaviour can also lead to a false sense of security, to the cost of behaviour detection/analysis. For example, if your WAF doesn't catch a specific exploit technique and your attacker tries that before reaching the bad-behaviour threshold, then the attempt will be allowed through.
Essentially, you add complexity and increased risk of overblocking for relatively little (but non-0) gain.
But, that's a tangent.
If you're running a WAF, you need to ensure that it can never block 127.0.0.1
otherwise your eepsite will have an outage (it's also possible to build a custom hybrid solution as I did for Tor, but with the benefit of hindsight, it's really not worth the effort).
Sessions tied to IP
It shouldn't be a thing on the modern web, but somewhere out there will be a site running ancient code just waiting to prove me wrong.
If session persistence is important to your application, you need to ensure that sessions are linked to a token that the client provides, rather than simply being derived from IP (as all of your users will have the same IP, and thus the same session).
Conclusion
This post is, perhaps, intimidatingly long. But, that's because it takes longer to write about some of these thigns than it does to check them. Multi-homing a site onto I2P is actually relatively straightforward, it's just a case of looking at what your site does and identifying potential problems.
A number of the items listed here are things that are well worth discovering anyway, as they can impact future www
side development and performance.
If you're able to work your site to the point that you can multi-home it onto different networks, then you'll also have enabled yourself to more easily do things like
- Switch hosting
- Switch CDN provider
- Build redundancy
- Launch new functionality
By publishing your site into I2P (and Tor) you're offering your users a strongly authenticated, available and private route to access your services.
When a user connects to an eepsite, they're guaranteed to either connect to your server or fail to connect at all: they won't end up inadvertently connected to (or via) someone else. That means no man-in-the-middle by a network level censor, no DNS poisoning (or censorship) by a malicious actor and no meaningful tracking of their traffic.
This may allow users in oppressive countries better access to information (all too important at the moment), or it might simply allow someone to browse your content comfortable in the knowledge that their ISP isn't going to sell their browsing history to advertisers.
Everyone (well, maybe not advertisers) is a winner.