Integrating against the RequestRouter Alt-Svc Hints API
Version 0.4 of RequestRouter introduces a new feature - an API designed to give edge devices (such as delivery appliances) hints to allow them to generate an RFC 7838
This documentation provides a reference implementation, allowing an OpenResty based edge device to connect back to the Alt-Svc Hints API, whilst minimising the potential latency impact to some extent.
Assumptions
This documentation assumes some familiarity with both OpenResty (i.e. that you already have it installed) as well as some basic familiarity with RequestRouter and how to manage it.
Background
It used to be that most servers that did not support ECS were considered old infrastructure that would eventually be in the minority as software got upgraded. Unfortunately, due to various privacy concerns, a number of new public services such as Quad9 and Cloudflare's 1.1.1.1 do not support ECS.
This introduces a severe limitation for DNS based routing as routing calculations without ECS available must be based upon the downstream resolver's IP (being the only information available to hint at the client's location).
With a large scale public DNS service, this can often lead to vastly inaccurate location calculations: Level3's DNS service, for example, is quite commonly implicated in mis-identification of client locations due to the way queries are routed to their PoPs.
The RR-65
Installing the Alt-Svc Hints API
This section exists largely for the purpose of completeness, if the relevant section of the RequestRouter documentation changes later then it should be considered to override this.
The software dependencies for the Alt-Svc Hints API are exactly the same as for the RR-37 HTTP Redirect Routing component.
apt-get install -y python-sqlite python-netaddr python-dev gcc make sqlite3 nginx
easy_install pip
pip install geoip2 uwsgi flask
As is true with the HTTP Routing component: installing it on a separate system to the DNS routing infrastructure is not mandatory, but is strongly recommended to ensure that excess load in one component does not adversely impact the other.
Nginx Server Block
You should configure a HTTPS server block in Nginx. The connection is going to have downstream subnets submitted over it, so do not use plain HTTP.
Alt-Svc-Api.conf
upstream router-worker {
keepalive 100;
server 127.0.0.1:8095;
}
server {
# We're only doing HTTP for the purposes of initial testing
#
# But this should totally be HTTPS (and will be later in the test process)
listen [::]:443 ssl;
server_name altsvc.example.com;
ssl_certificate /path/to/cert/fullchain.pem;
ssl_certificate_key /path/to/key/privkey.pem;
access_log /var/log/nginx/router.log routing;
add_header Access-Control-Allow-Origin *;
location / {
proxy_set_header Host $http_host;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header X-Forwarded-For $remote_addr;
proxy_pass http://router-worker;
# We cache to keep some load off the backend
proxy_cache my-cache;
proxy_cache_lock on;
proxy_cache_key "$http_host/$request_uri";
proxy_cache_valid 200 3s;
proxy_cache_valid 400 5m;
proxy_cache_valid 204 10m;
proxy_cache_valid 404 3s; # No Caches available
# As per design spec, do not cache 304s
}
}
Log Format
In the example above, we're using a custom log format to aid in debugging responses. In nginx.conf the log format is defined as
log_format routing 'routing\t$remote_addr\t-\t$remote_user\t[$time_local]\t"$request"\t'
'$status\t$body_bytes_sent\t"$http_referer"\t'
'"$http_user_agent"\t"$http_x_forwarded_for"\t"$http_host"\t'
'CACHE_$upstream_cache_status\t$request_time\t"$upstream_http_x_reason"\t'
'$hostname\t$upstream_http_x_caches\t$upstream_http_x_sourcezone';
This format allows us to see where a client was directed to, as well as the location we believed they were in
routing 104.248.174.52 - - [01/Oct/2018:20:22:36 +0000] "GET /wwwsite.balanced.bentasker.co.uk/51.255.232.0 HTTP/1.1" 200 71 "-" "lua-resty-http/0.10 (Lua) ngx_lua/10013" "-" "altsvcapi" CACHE_MISS 0.003 "-" debian-rr65-test-router 51.255.232.237 g.fr
In the example above we can see the client was geolocated to France, and the recommended edge device has IP 51.255.232.237 (where multiple devices are recommended, the IPs will be comma separated).
Edge Device Implementation
The change to the edge device consists of adding some LUA to make a call out to the API, as well as inserting a server block to act as a caching tier.
The details below assume the following configuration in
nginx.conf
user www-data;
worker_processes 4;
worker_rlimit_nofile 65535;
error_log /var/log/nginx/error.log warn;
pid logs/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
client_max_body_size 10M;
log_format main '$remote_addr\t-\t$remote_user\t[$time_local]\t"$request"\t'
'$status\t$body_bytes_sent\t"$http_referer"\t'
'"$http_user_agent"\t"$http_x_forwarded_for"\t"$http_host"\tCACHE_$upstream_cache_status\t$request_time\t$hostname';
access_log /var/log/nginx/access.log main;
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
gzip on;
gzip_vary on;
gzip_static on;
gzip_types text/css text/javascript text/plain application/x-javascript application/json application/javascript;
proxy_cache_path /mnt/cache levels=1:2 keys_zone=my-cache:8m max_size=8000m inactive=300d;
proxy_temp_path /mnt/cache/tmp;
proxy_cache_use_stale updating invalid_header error timeout http_502; # Use a stale entry if origin unavailable
# Every server should have this
add_header X-Clacks-Overhead "GNU Terry Pratchett";
# IMPORTANT - ensure this directory exists
lua_package_path '/etc/nginx/lua/?.lua;;';
lua_shared_dict altsvccache 10m;
include /etc/nginx/conf.d/*.conf;
}
The directory /etc/nginx/lua should exist. Within that directory should be a directory called resty containing the LUA Resty HTTP module
We also need to create the server block that we'll use as a cache -
alt-svc-cache.conf
upstream altsvc {
keepalive 100;
server 1.1.1.1:443;
server 1.2.2.2:443;
}
server {
listen 127.0.0.1:8094;
location / {
proxy_pass https://altsvc;
proxy_cache my-cache;
proxy_cache_valid 200 10m;
proxy_cache_valid 400 1h;
proxy_cache_valid 204 1h;
proxy_cache_valid 304 4h;
proxy_cache_valid 404 10m;
proxy_http_version 1.1;
proxy_set_header Connection "";
# Use very short timeouts to ensure we don't delay delivery
proxy_connect_timeout 1;
proxy_read_timeout 1;
proxy_cache_lock on;
proxy_ignore_headers X-Accel-Expires Expires Cache-Control Set-cookie;
proxy_hide_header X-Dont-Cache-Me;
}
}
In this config, we explicitly specify the IP of our Alt-Svc api nodes. An alternative would be to configure the Alt-Svc API's FQDN to resolve via RequestRouter (sending each edge device to the nearest node to reduce latency).
However, that would incur several RTT's of extra latency (1 for the DNS query, 3 for the handshake if it's a new connection), so in this example we maintain a pool of keep-alive connections to the backends. This works best when your Nginx configuration is centrally managed, otherwise you incur the management overhead of having to update config on each edge device if you make changes to the routing pool
It should be noted here that using Nginx's
We cache different statuses for varying amounts of time, as each has a different significance to our decision on whether to insert an Alt-Svc header. If we receive status 304, it means the client is already correctly routed, so we can afford to cache that for quite some time (as the client can clearly reach us).
Next, we need to create the work-horse for this solution. In
alt-svc-api.lua
local table = table
local require = require
local http = require("resty.http")
local json = require "cjson"
local string = string
local server = '127.0.0.1'
local edge_name = ngx.var.edge_name
-- https://snippets.bentasker.co.uk/page-1705231204-Split-string-on-Delimiter-LUA.html
function strSplit(delim,str)
local t = {}
for substr in string.gmatch(str, "[^".. delim.. "]*") do
if substr ~= nil and string.len(substr) > 0 then
table.insert(t,substr)
end
end
return t
end
function place_api_request(ngx, remote_addr, edge_name)
-- Initiate the HTTP connector
local httpc = http.new()
httpc:connect(server,8094)
local res, err = httpc:request {
path = table.concat({'',edge_name,remote_addr},'/'),
method = 'GET'
}
-- We're done with the connection, send to keepalive pool
httpc:set_keepalive()
if not res then
return False
end
ngx.log(ngx.ERR,res.status)
-- Check the status
if res.status == 200
then
-- We need to decode the JSON, but for now just return True
body, err = res:read_body()
ngx.log(ngx.ERR,body)
return json.decode(body)
end
end
function calcRemoteSubnet(ngx)
local remote_addr = ''
-- Escape periods
if string.find(ngx.var.remote_addr,"%.") then
-- this will only work with IPv4
local r = strSplit(".",ngx.var.remote_addr)
-- Remove the last entry
table.remove(r)
table.insert(r,0)
-- Implode to turn back into an IPv4 type string
remote_addr = table.concat(r,'.')
else
-- And this only with IPv6
local r = strSplit(":",ngx.var.remote_addr)
-- We want to trim down to a /64 (even that may be a bit
-- large).
--
-- Collapsed addresses are an issue here, so grab first 4 hextets
-- rather than stripping the others
--
-- Not massively happy with this
local s,v,x,y = unpack(r, 1, 4)
remote_addr = table.concat({s,v,x,y,':'},":")
end
return remote_addr
end
function runProcess(ngx)
-- Squash the remote IP down to a subnet
local remote_addr = calcRemoteSubnet(ngx)
-- Check the cache
local cache = ngx.shared.altsvccache
local cachekey = edge_name .. remote_addr
local e = cache:get(cachekey)
-- See if we hit the cache
if e ~= nil then
ngx.header['X-Alt-Svc-Cache'] = 'HIT'
local cacheitem = strSplit("|",e)
if cacheitem[1] == "200" then
ngx.header['Alt-Svc'] = cacheitem[2]
end
return
end
-- Place the request to the API
local resp = place_api_request(ngx,remote_addr,edge_name)
local hdr = ''
-- Check the response
if resp ~= nil and resp['status'] == 200 then
-- Create a table for our built responses
-- No string concat because its slower
local l = {}
local s = {}
for k,v in pairs(resp['hosts'])
do
s = {'h2="', v, ':443"'}
table.insert(l,table.concat(s,''))
end
-- Create the header value
hdr = table.concat(l,", ") .. table.concat({';'," ma=",resp['ttl']},'')
-- Send the header we have built
ngx.header['Alt-Svc'] = hdr
end
if resp ~= nil then
cache:set(cachekey,table.concat({resp['status'],hdr},"|"),20)
end
end
-- If Alt-Used is present then the client already followed an
-- Alt-Svc header - dont waste time giving them a new one.
if not ngx.var.http_alt_used then
runProcess(ngx)
end
Essentially, this does the following
- If a client has included the Alt-Used request header, then it means they've already followed and Alt-Svc header and we shouldn't waste any cycles trying to further optimise the routing.
- break the client IP down to a subnet (to maximise the benefit of the inline caches - it also means routing doesn't collect a list of exact IPs)
- Calls the Alt-Svc API. If the API responds with alternate hosts, an Alt-Svc header is generated
- Tells the client to cache the header for a period by inserting a
ma value - this period is set by the API
The final step is to call the LUA from within any server blocks you want to optimise.
As well as referencing the LUA file, the edge name configured within RequestRouter should be provided
example.conf
server {
listen [::]:443 ssl http2;
server_name 'foo.example.com';
set $edge_name 'wwwsite.balanced.bentasker.co.uk';
access_by_lua_file /etc/nginx/lua/alt-svc-api.lua;
location / {
# Insert the rest of your nginx config here
}
}
Reload or restart Nginx and you should be good to go.
Testing
The easiest way to test that your changes are working is with
From a system that would not be routed to your edge device, place a request for a domain you've enabled the LUA on, whilst forcing resolution to your edge device:
curl https://foo.example.com/ --resolve "foo.example.com:443:3.3.3.3" -o/dev/null -s -v -o/dev/null 2>&1 | grep Alt
(Where 3.3.3.3 in the example above is the IP of your edge device). You should see an
Alt-Svc: h2="51.255.232.237:443"; ma=30
If you want to test the full flow, grab an
Try connecting out to your site with developer tools open, and you should see later requests use a new server (and include an
Scaling
Careful consideration should be given to scaling the API infrastructure appropriately. You will need to consider your traffic profile when assessing how many requests the API is likely to see at any given point.
For networks where the request to client ratio is small, the number of resulting requests against the API is likely to be far higher than one where each client places a lot of requests.
Without the protective caches implemented in this documentation, the initial implementation of the API can handle request rates of around 1100 requests per second, per CPU core. This rate represents a worst case scenario where all caching tiers report a
Caveats
There are a number of caveats with
- HTTPS sites only - RFC 7838 specifies that the origin being directed to (i.e. the new server) must support SNI and must be able to provide a certificate valid for the original domain name. This is to ensure that any new origin can authenticate that it is authorised to serve the requested domain.
- All edge devices must support HTTP/2.0 - the
h2 in theAlt-Svc header stipulates that HTTP/2 should be used when connecting to the new origin. Whilst the RFC allows for HTTP/1.1 to be specified, no browser currently appears to implement support for this. - Browser support for
Alt-Svc is currently limited. Mozilla Firefox (and Tor Browser Bundle) support it. Chrome currently only supports using it to connect to a new origin usingQUIC (though HTTP/2 support is apparently coming) - Following
Alt-Svc is optional. It's important to note that just because the header is served, it doesn't mean that clients will definitely follow it. It's down to the user-agent to decide whether to honour or ignore the header. - The first request is unaffected.
Alt-Svc is served along with the response to the user-agent's request. So the first request will be via the less optimal edge-device. If the user-agents sessions consist of a single HTTP request then this functionality will serve no benefit. Multi request sessions - like adaptive HTTP streaming, web sites etc - on the other hand, should benefit.