Replacing My Adblock Lists

I started to curate my own adblocking scripts back in 2014, making them available in the directory /adblock/ on my site.

At the time of their creation the lists were poorly controlled and pretty sparsely documented:

Original Adblock list documentation

In 2018, I got my act together a bit: I moved the lists into a Github repo and implemented project management to track additions to the lists.

Whilst project management improved, the publishing of the lists continued to rely on some increasingly shonky bash scripts. Those scripts modified and compiled in third party lists, stripped duplicates and generated various output formats. They were never engineered so much as spawned.

Because the lists used third party sources, the compilation process needed to run at scheduled intervals: it couldn't simply be triggered when I made changes, because the lists were presented as near-complete alternatives to others.

Despite their awful hacky nature, those scripts managed to compile and update my block lists for nearly 8 years.

However, they've long been overdue for replacement.

This post serves as a record for the deprecation of my original adblock lists, as well as providing details of their replacement.


Deprecation of the old lists

The original lists offered quite a number of variations

  • Full Autolist (Unbound format)
  • Manually blocked zones (Unbound format)
  • list of blocked domains/zones (ABP/Ublock compatible)
  • list of blocked domains/zones without Social Media tracker domains (ABP/Ublock compatible)
  • Modified version of EasyList (ABP/Ublock compatible)
  • Modified version of EasyList without Social Media tracker domains (ABP/Ublock compatible)
  • List of Social Media tracker domains (ABP/Ublock compatible)
  • Pi-Hole compatible blocklist
  • Pi-Hole compatible blocklist with Social Media tracking domains

All of which were compiled, on schedule, by a cron job ready for delivery via my CDN.

Originally the lists were managed via a small set of configuration files, though in May 2020 it was changed so that domains could be seperated into dedicated files for easier management.

The compilation workflow was quite complex and involved a number of (poorly named) temporary files, which were used to build each of the lists above.

Over time, making changes/improvements to the build scripts became increasingly impractical as the complexity grew without the benefit of a sorely needed refactor.

Delivering the lists via my CDN was simple - the cronjob ran on the origin and there were dedicated location blocks in my CDN's Nginx config to ensure that users got a relatively fresh version

    location /adblock/ {
            expires 1d;
            proxy_cache_valid  200 302 301 45m;
            proxy_cache_valid  404      45m;
            proxy_set_header X-Forwarded-For $remote_addr;
            include 'domains.d/proxy_settings_bentasker.co.uk.inc';

    }

My 2021 move to delivering via a 3rd party cdn complicated things a little: the CDN supports specifying different edge rules, but they're not nearly as simple to define as they are in Nginx.

So, I decided that it was (finally) time to replace the lists.

The full rationale behind my decision can be found here but can be summarised as

  • The compilation scripts were too clunky and hard to update
  • I no longer wanted to dedicate infra to managing/maintaining the lists
  • Delivery was becoming a headache (especially given increased uptake of the lists)

Because I didn't want to run supporting infra for the lists, the new lists could no longer consume third-party lists like EasyList: there's no point presenting as a "complete" solution if rapidly-changing third-party sources are only going to be refreshed once in a while.


Abandon don't change

One option available was to simply change the way that the lists get generated, so that the existing links continued to work and served "version 2" of my lists.

However, this wasn't something I felt that I could do:

The original lists were presented as a near-complete offering: providing a compilation of my own and other's lists.

The new lists, on the other hand, are not intended to provide the same breadth. They're a compilation of domains that I've added to the list as and when I've found them.

I couldn't, in good conscience, quietly replace a list containing tens of thousands of domains containing hundreds. Punching holes in people's tracking protection without their knowledge isn't a good faith act.

I would also have needed to serve the content from the exact same URLs. Whilst a browser will handle a HTTP 301 redirect, there are more than a few scripts out there which do a variation of

curl -s "https://www.bentasker.co.uk/adblock/regex_blocks.txt"

Introducing a redirection would break all of those (curl doesn't follow redirects by default).

So, the original lists will remain available (but no longer being updated) at their original URLs.

This decision also freed me up to explore other alternatives: like delivering via Github instead of my own CDN and fixing names.


The New Lists

The new lists are now live and consist of the following files

  • adblock_plus.txt: Adblock Plus and UBlock Origin compatible format
  • unbound.txt: Unbound config compatible format
  • blockeddomains.txt: A simple list of Blocked domains
  • regexes.txt: A list of zone wide blocks
  • zones.txt: A list of zone wide blocks

They can be added to adblockers/pihole etc via the following URLs

Project management is at projects.bentasker.co.uk

The page at /adblock/ has also been updated to provide links to the new lists:

Adblock home page

There's also a (limited) README at https://github.com/bentasker/adblock_lists_v2, meaning the new lists are already better documented than the original version.

Although there are, undoubtedly, improvements to be made, list compilation is now driven by some git commit hooks.


Restoring Missing Blocks

If you want to use the new lists, but also want the domains that would have been included in the originals, you can add the following domain lists to your Pi-Hole/Adblocker

And, of course, you can configure Adblock Plus or Ublock Origin to consume https://easylist.to/easylist/easylist.txt.