Disabling Webmention Backfeeds

Ben Tasker

2024-04-01 15:38

In December 2023, I added support for webmentions to my site. As part of the setup process, I also configured a backfeed of mentions from external sources such as Github, Mastodon and Twitter - the idea being that if you comment on my content on social media, it'd show up as a comment under the relevant post (hopefully adding something of value for other readers).

At the time, I noted that I did have some qualms about the ethics of collecting mentions from other sources:

I was reading a post by Terence Eden discussing the ethics of syndicating comments using WebMentions and found myself a little undecided on the question that Terence poses, at least as it applies to extracting comments from silos like Twitter

...

Jumping back to the introduction to this post, this is where I feel a little conflicted about the ethical posture of collecting and syndicating mentions.

However, I reasoned that it wasn't all that different in principle to users taking a screenshot of a social media post and sharing it elsewhere, hardly an uncommon occurrence.

Recently, my discomfort with the idea of pulling people's comments from Twitter/Mastodon/wherever and displaying them on my site was reignited after reading "Mastodon Webmentions and Privacy" by Robb Knight.

Although privacy is a strong motivator, it's not the only concern that I have with having a webmention backfeed active, so I've decided to disable the collection and display of web mentions on my site.

In this post, I'll talk a little more about why I've chosen to do so.

Webmention Backfeeds

Before getting started, I want to take a moment to differentiate between webmentions and backfeeds.

WebMention is a standard used to send interactions between sites and, at it's simplest, is like the the pingbacks of old: when I publish a post linking to someone else's, an automated call goes out to let them know (usually resulting in a link back to my post appearing under theirs).

I currently have no particular objection to those - if I link to one of your posts and you have a compatible endpoint, I'll continue to send a webmention. Similarly, I'll continue to receive webmentions from those explicitly sending them.

Backfeeds, though, are somewhat different. A webmention backfeed service (like Bridgy) passively picks up on references made to content elsewhere and then generates a webmention on that user's behalf.

Screenshot of the Brid.gy homepage showing the logos of social media services that it integrates against

When a reference is observed, Brid.gy generates a webmention and sends it on to the relevant endpoint, making the original user's action and avatar available outside of their chosen social network.

The key bit here is that Brid.gy is linked to my social media accounts and not to those of the user's it is sending webmentions about - they likely have no idea that it's even happened.

Privacy

Whilst there isn't much in Robb's post that hadn't rolled through my mind when implementing, it did help to affirm that these were concerns that others shared as well as prompting me to give it some more thought.

Links and comments from his post also led to me finding other people expressing similar concerns elsewhere.

For example, Mario Hamann notes the importance of a post's context:

Posts live in the context of their platform, and even if it's public, that context always matters.

Wouter Groeneveld's long post about webmentions in general contains various good technical arguments, but also raises a good point about user expectations and privacy

tweets of people mentioning or replying to your link via Twitter suddenly appear as a mention on your site. Great stuff, right? Except those people have no idea their avatar and text is being yanked.

Admittedly, my implementation was a little more forgiving than most in this respect, because I specifically didn't include avatars

Screenshot of a webmention being displayed on my site

However, the user's text is there and is directly attributed to them. Plus, whether or not I'm consuming it, Bridgy will have archived a copy of the user's avatar and sent details to the user's browser:

Screenshot of a bridgy reponse, the details have been blurred but show that it includes a link to the avatar as well as the author's profile

This is not something that the average social media user is likely to expect.

Security

Webmentions are a one-time process and so don't take the original user's future behaviour into account.

If a user later edits or deletes their post, the original would remain available on my site because there isn't a mechanism by which I could come to know that it had been updated.

This could be particularly problematic if the user accidentally posted some compromising information (a home address, the wrong photo etc).

Worse, the system is open to active abuse because it provides an easy means of amplification:

Alice posts Bob's home address on social media, linking to various sites that Alice knows use webmentions
A moderator deletes Alice's original post (and/or bans their account)
Bob's address is no longer available within the social network
However, Bob's home address is probably now marked on a range of websites (as well as being sat in Brid.gy's database) and will likely eventually start to appear in web search results

At best, Bob now has to try and make contact with the admins of multiple disparate websites to try and have the dox removed (assuming, of course, that Bob is even aware of the incident in the first place).

We're probably all familiar with the idea that, once published, it's very hard to fully remove anything from the internet. Whilst there's definitely some truth in that, I would argue that it still does not extend to justifying systems which (accidentally or otherwise) perpetuate that harm.

Webmention Quantity

Another issue is that there just aren't that many webmentions to send. It's not that my content's not being interacted with, it's just that backfeed services simply can't catch the majority of interactions (a situation worsened by Twitter having closed off its APIs).

In the context of this post, it might sound like a strange thing to complain about, after all, I'm effectively now saying that Brid.gy should somehow be more privacy invasive.

But that is the problem, because there's an inherent paradox:

Diagram showing the paradox - Bridgy needs to capture more to even start to justify the privacy impact. But, if it captures more it will impact more users therefore not justifying its impact

I'd need more webmentions coming in to feel that it's starting to provide sufficient value, but in doing so, the potential for harm increases. Even more "value" is then needed to justify it's existence.

Webmention Quality

Of the webmentions that I've received, the vast majority have been likes or boosts/re-posts:

Pie chart showing proportions of web mention by type. More than half are likes, with about another quarter being retweets/boosts

Whilst those, obviously, are good for me to see (dopamine FTW!), there probably isn't a wider benefit to embedding them into pages - it's comments or textual posts that are likely to bring the most benefit to others and there are far fewer of those.

The issue isn't just with the volume of textual web-mentions though: Elsewhere, others have reported that the quality of received text webmentions can be very low:

I think I’m ready to wholly remove all indieweb functionality from my website. The only webmentions I receive at this point are spam from a Russian poker website that likes to link to 1 random blogpost of mine that has nothing to do with gambling, or gaming.

Whilst I've not (yet) had any particular issues with this, it is a risk that I do need to be aware of: Because I run a static site, my options for webmention moderation are fairly limited, largely relying on me identifying spam and blocking the entry by ID. Tooling aside, I also don't really want to get into a position where I'm doing much moderation - I've done my time on that front.

Displaying Counters

One alternative that I wondered about, was moving from embedding full comments to simply embedding counters:

10 favourites, 5 reposts, 3 replies

However, this idea quickly ran into some issues

As above, there aren't enough mentions being sent - if I know my content is being interacted with (say) 100 times, why embed a counter that only reports 4 of them?
The mechanism would still need to pull from Brid.gy: even if we're not actively displaying it to the user, their browser is still being sent a JSON blob filled with profile information etc

The second issue is addressable, but helped me realise that it's not a problem that I actually feel overly invested in solving - I'm just as happy turning webmentions off.

Conclusion

There are various IndieWeb approaches and technologies that I quite like, but I've come to realise that having a webmention backfeed simply isn't one of them. The concept really doesn't sit well with my ideals around user control, privacy and agency.

If anything, I'm actually somewhat surprised that I enabled a backfeed in the first place, it definitely now feels like a mistake and one that I should have thought a little harder about at the time.

The rationale that I used in reply to Terence Eden's blog on the subject was reasonable:

Screenshot of one of my own replies, embedded into a blog post. I opine that it's not much different to someone screenshotting a social media post and including that in their blog post

However, what I failed to properly consider was the difference between manually taking a screenshot (a relatively rare event) and automating the capture and storage of replies - whilst the operations are similar, the latter happens on a much larger scale with no real human oversight (although in Terence's case, oversight is added by manually vetting comments before publishing).

A lack of oversight translates into increased potential for harm.

Whilst there are, undoubtedly, other ways to spread unwanted content across the web, use of webmention backfeeds enable a means to easily ensure that malicious social media posts can leave a potentially indelible mark on the web.

Of course, human moderation can help reduce the likelihood of that - however, I don't really want to get back into moderating and, even if I did, the mentions let through would still ultimately be being published without the original author's knowledge or consent.

Over the years, I've written quite a lot about privacy and security and, on reflection, I'm not sure that it's actually possible for a webmention backfeed to ever be compatible with my views on those topics. Github issues like this one only really serve to further reinforce the feeling that the system is doing various people a dis-service.

For me, the current state of affairs is:

I've deleted my Brid.gy account links, so there's no longer a backfeed active.
Other sites can currently continue to send me webmentions
Received webmentions will not currently be displayed (but I'll still get a notification from my monitoring)
When I publish a post, webmentions will still be sent, courtesy of my webmentions plugin for Nikola

Somewhere down the line, I'll review whether I'm actually going to continue receiving webmentions at all. One of the concerns that I have with it is that Brid.gy will still send interactions(those captured by other Brid.gy users), so turning off my account links has only really addressed part of the problem.

Discuss on Mastodon