Monitoring the Tor daemon with Telegraf
My services have been available via Tor .onion for around 7 years now, but my monitoring of their availability has always been relatively limited. I did previously have smokeping
running reachability tests, but other than that there's been a reliance on me noticing that things weren't right (or perhaps receiving reports that an .onion
was misbehaving).
Part of the reason for this is that there's never (to my knowledge) been a good centralised way to monitor the health of a Tor install. Nyx is a fantastic command line tool, but relies on the operator logging into their box: it's akin to relying on top
to monitor CPU usage.
I've always figured that it should be possible to monitor the tor
daemon more effectively, but never really quite got around to do anything about it.
This week, I decided to take a pop at it, and a quick scan over Tor's control port spec revealed how easy it should be to collect stats.
This documentation details how to use my new Tor Daemon Plugin for Telegraf to collect metrics from a Tor daemon.
The full list of statistics collected can be seen in the plugin's README, but they include
-
bytes_rx
: total bytes received by Tor -
bytes_tx
: total bytes transmitted by Tor -
uptime
: Tor daemon uptime -
version_status
: Tor's assessment of whether the installed version is OK to use - Accounting information: is a quota set? If so, how much is left?
- Reachability test statuses
- Guard node states
Although my main focus is on monitoring the availability of my onion services, the plugin can be used to monitor tor relays, bridges and exit nodes too.