Designing privacy friendly analytics

It was only two weeks ago that I wrote

but I've whittled the amount of javascript on the site right down, and don't really relish the thought of increasing it. Nor do I particularly like the idea of implementing probes that can track user movements across my sites, when all I currently need is aggregate data.

Unfortunately, that set my brain wandering off thinking about what scalable privacy friendly analytics might look like - which inevitably led to prototyping.


The system is supposed to be privacy friendly, so there are some fundamental rules I wanted to abide by

  1. Actions and time are the primary focus, not the user - we don't need to record user identifiers
  2. The system should be lightweight and collect only that which is needed
  3. There should be no ability to track a user cross-site (even if the analytics is used on multiple sites)
  4. The default behaviour should make it all but impossible to track user movements within a site

The aim being to strike a balance where we can collect the data required to maintain/improve a site, with as little impact on the user's privacy as possible.

Whilst I trust me, there's no reason users should have to, and we should assume that at some point, someone less trustworthy will find a way to access the stored data: the less identifying data available, the less use it is to them.

Read more…

Attempting to control Youtube access on Android

It's a problem that our parents didn't really have to contend with - easy, unlimited access to a massive library causing massive amounts of screen time.

We used to get complaints about the amount of time spent watching TV (or on a gameboy), but the library available to us was quite limited, so there was a point where you just stopped watching and did other things for a while (assuming we weren't outright kicked out of the house and sent to the park).

The Problem

Now though, not only is content just a click away, but it's actively pushed to us and our kids.

It's not just the distribution mechanisms, social media is deliberately designed to be immersive and even addictive. In ye olden days, we'd get an episode of what we wanted, but then something completely unrelated would come on - nowadays the approach is much more take this, this, this, oh and you might be interested in this. It's very easy to lose track of the time you've spent, even as an adult.

Littlun has, over time, developed something of a Youtube habit.

That's led to some good conversations about not over-trusting content creators, which seem to have been well absorbed (in the sense that the content being watched is more appropriate, even if I do think some of the streamers are complete wazzocks). We've also had conversations about the importance of talking to an adult if something unpleasant/inappropriate comes up.

So, my concern now isn't so much the content as the time spent on Youtube (given the chance).

That's a much harder issue to resolve through conversation alone, as it's easy to be unware of time spent absorbed, and services like Youtube are designed to exploit that.

What this means is that as well as conversations, some technical measures are required, including

  • a prompt/reminder about the time spent
  • a means to block access

The latter being the "big stick" that I can reserve use of, in order to encourage mini-me to pay a bit more attention to the former.

Blocking Youtube on the LAN is simple (Pihole to the rescue), but it's an incomplete solution. At some point, Littlun'll notice that Youtube works when out and about and realise that the block can be circumvented at home by turning wi-fi off.

This post details the way's I've looked at to help control/restrict Youtube access on Android in a way that doesn't simply disappear with a change in network connection

Read more…

Tracking My Website Performance and Delivery Stats in InfluxDB

Earlier this year, I moved from serving via my own bespoke Content Delivery Network (CDN) to serving it via BunnyCDN.

It was, and remains, the right decision. However, it did mean that I lost (convenient) access to a number of the stats that I use to ensure that things are working correctly - in particular service time, hit rate and request outcomes.

I recently found the time to look at addressing that, so this post details the process I've followed to regain access to this information, without increasing my own access to PII (so, without affecting my GDPR posture), by pulling information from multiple sources into InfluxDB.

Read more…

Running multiple Tor daemons with Docker

Running a Tor relay helps give bandwidth back to the network, however it's not uncommon for new relay operators to be surprised at Tor's performance profile.

Tor is not multi-threaded, so operators arrive with multi-core machines and find that only a single core is actually being worked. However, it is possible to maximise use of multiple CPU cores by running multiple instances of the tor daemon - this documentation details how to do it using docker (although it's perfectly possible without containerisation too).

Read more…

Collecting Nextcloud User Quota Information With Telegraf

Collecting system level stats from Nextcloud with Telegraf is well documented, and very well supported.

However, I wanted to extract some additional information - current storage quota allocations and usage. Nextcloud allows you to apply a storage quota to each individual user, so I though it'd be useful to be able to monitor for accounts that are getting close to their quota.

The information is a bit more buried within Nextcloud's APIs than the system level stats, and so can not be (as easily) consumed using inputs.http.

This post gives details of an exec plugin which can fetch quota usage, per user, and pass it into Telegraf in InfluxDB line protocol

Read more…