Monitoring I2PD with Telegraf

I recently made my site available via I2P as an eepsite, using i2pd as my I2P client.

However, establishing connectivity was only really the first part of that - I needed to be able to monitor i2pd so that I could see when it goes down (or has other issues) rather than waiting for people to complain the the eepsite isn't reachable.

I do the vast majority of my system monitoring using Telegraf. A quick search didn't yield any Telegraf plugins for i2pd - so, I created my own Telegraf Plugin for i2pd.

The plugin's currently very rough around the edges, but does what I currently need. Because i2pd only exposes it's stats as HTML it relies on scraping the webconsole.

This documentation details how to set Telegraf up to report stats from i2pd into InfluxDB so that I could generate graphs in Chronograf.


Installing and Configuring the plugin

I've got i2pd configured to expose it's web interface on port 7070 (the default), so I just needed to put a copy of my plugin onto the server (into /usr/local/bin)

git clone git@github.com:bentasker/telegraf-plugins.git
cp i2pd-statistics/i2pd-statistics.py /usr/local/bin

And then configured Telegraf to run the plugin - I added the following TOML to /etc/telegraf/telegraf.conf

[[inputs.exec]]
commands = ["/usr/local/bin/i2pd-statistics.py"]
data_format = "influx"

Then, it was just a case of restarting Telegraf

systemctl restart telegraf

Graphing

Once the data started coming in, I created a dashboard to expose useful statistics

I2PD Statistics Dashboard

The flux underneath the Creation Success rate gauge is pretty straightforward

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart)
  |> filter(fn: (r) => r._measurement == "i2pd")
  |> filter(fn: (r) => r.host == v.host)
  |> filter(fn: (r) => r._field == "tunnel_creation_success_rate")
  |> last()

As is the network throughput query

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart)
  |> filter(fn: (r) => r._measurement == "i2pd")
  |> filter(fn: (r) => r.host == v.host)
  |> filter(fn: (r) => r._field == "in_avg_bps" or r._field == "out_avg_bps")
  |> keep(columns: ["_value","_time","host","_field"])

We can output a similar graph to show the transit throughput, using field transit_avg_bps (I2PD reports only the outbound transit rate).

We can also show a count over time for Routers, FloodFills and LeaseSets. We don't necessarily need to know what each of these are (although if you're interested, this explains it) to monitor them - what we're look for is sudden changes.

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart)
  |> filter(fn: (r) => r._measurement == "i2pd")
  |> filter(fn: (r) => r.host == v.host)
  |> filter(fn: (r) => r._field == "routers" or r._field == "floodfills" or r._field == "leasesets")
  |> keep(columns: ["_time","_field","_value","host"])

I2PD Statistics Routers, Floodfills and LeaseSets

We can also get information about the state of tunnels over time (state being one of building, established, exploring, expiring or failed).

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart)
  |> filter(fn: (r) => r._measurement == "i2pd")
  |> filter(fn: (r) => r.host == v.host)
  |> filter(fn: (r) => r._field == "tunnel_count" )
  |> filter(fn: (r) => r.direction == "inbound" )
  |> keep(columns: ["_time","_value","host","tunnel_state"]) 

With the same graph for outbound tunnels simply being a case of changing the filter on r.direction to outbound.

I2PD Statistics Tunnels and their states


Conclusion

It's not perfect - the plugin is very rough and ready (I am intending to tidy it) and also has to rely on screen-scraping i2pd's webconsole, so a future release of i2pd might well break collection.

But, it does allow me, via Telegraf, to keep an eye on the state of i2pd to spot if the eepsite goes down.