Using BlueSky Features As Disinformation Tools

Recently, whilst working on implementing automatic posting into BlueSky I ran into an issue with link-preview cards not being displayed.

Posts are submitted into BlueSky using ATProtocol, which place the onus on the sender to generate and provide preview card functionality, so that a rich preview can be displayed alongside post text.

In my other post, I described the need to do this as being a pain in the arse. However, there's more to it than that: having the ability to submit arbitrary card content is problematic because it can be used to facilitate disinformation campaigns.

Bluesky also uses facets, which allow the sender to turn text into arbitrary hyperlinks, presenting its own set of issues.

In this post, I'll explain why giving the sender control over these items is potentially harmful.

Note: I did email Bluesky detailing my concerns, but given that

  • The ability to do this is publicly documented
  • It turns out it's also something that BlueSky were already made aware of and have defended.
  • Update: 2 weeks later, they've still not replied at all

There didn't seem to be any value in delaying disclosure: it's better to ensure there's awareness of the issue.


Preview Cards as Disinfo Tools

Let's start with the preview cards.

Unlike other social networks, a preview card will not be displayed if the sender does not provide it alongside the post:

Screenshot of text-only post in Bluesky. There's no rich preview, and the link's even truncated - it looks bloody awful

Instead, at time of sending, an external embed needs to be provided. The Python module can be used to define one as follows:

embed_external = models.AppBskyEmbedExternal.Main(
            external=models.AppBskyEmbedExternal.External(
                title="Google's home page",
                description="Search here to have all your data slurped",
                uri="https://www.google.com",
                thumb=blob.reference # will cover this later
            )
        )   

Because, as the sender, we define the card, we're empowered to populate it with any content, whether or not it relates to the link destination.

The following script provides an example of doing exactly that:

import requests
import re

from atproto import Client, models
from datetime import datetime

# Username
BSKY_USER="<redacted>"
BSKY_PASS="<redacted>"


# The text of the post itself
text = (
    "This post should be accompanied by a link card "
    "using a picture that doesn't appear anywhere on the"
    " target site. It's possible because AT Protocol "
    "leaves it to the sender to define the rich preview."
)

# Now define what we're going to set the card to use
title = "Pictures of Ben Tasker"
real_link = "https://www.bentasker.co.uk/pages/about-me.html"
short_desc = (
    "I'm Ben Tasker, and this page only contains photos"
    " of me... honest"            
)

# The URL of a random image to use as the thumbnail
thumb_url = 'https://www.barbie-collectible.com/wp-content/uploads/2017/03/Barbie-2016-Holiday-Doll-3.jpg'

# Connect the client
BSKY_CLIENT = Client()
profile = BSKY_CLIENT.login(BSKY_USER, BSKY_PASS)

# Fetch the thumbnail and upload it
response = requests.get(thumb_url)
img_data = response.content
upload = BSKY_CLIENT.com.atproto.repo.upload_blob(img_data)

# create the card
embed_external = models.AppBskyEmbedExternal.Main(
            external=models.AppBskyEmbedExternal.External(
                title=title,
                description=short_desc,
                uri=real_link,
                thumb=upload.blob
            )
        )   

# Post it
BSKY_CLIENT.com.atproto.repo.create_record(
    models.ComAtprotoRepoCreateRecord.Data(
        repo=BSKY_CLIENT.me.did,
        collection='app.bsky.feed.post',
        record=models.AppBskyFeedPost.Main(
        createdAt=datetime.now().isoformat(), 
        text=text, 
        embed=embed_external
        ),
    )
)       

The process for thumbnail attachment is a little odd: you need to read the thumbnail (either from disk, or by fetching it from a URL), then upload that to Bluesky, which'll respond with a blob reference. The blob reference is then finally passed into AppBskyEmbedExternal.External for use as the thumbnail.

But, anyway.. don't I look just lovely in my green dress?

Screenshot of post in Bluesky. The preview card carries the title Pictures of Ben Tasker and the description Im Ben Tasker, and this page only contains photos of me... honest. The preview image shows a waist, hands and a green dress - its actually a picture of a Barbie

Messing around aside, the ability to do this is extremely problematic.

A few years back, it was found that approximately 41% of people shared or re-tweeted links without actually having read them.

As an extreme example: The Science Post shared a post with the headline "Study: 70% of Facebook users only read the headline of science stories before commenting". The post itself only contained two real sentences, which repeated themselves before being followed by blocks of Lorem Ipsum:

Screenshot of the science post article. The first paragraph consists of 2 real sentences which are then repeated. Subsequent paragraphs are all lorem ipsum. Its now been shared nearly 200,000 times

Their initial social media share received nearly 46000 re-shares as a result of users doing exactly what the headline said.

These examples indicate that, a lot of the time, the headline and subtext shown in preview cards is probably being treated as a reliable indicator of the linked story.

With the content of the card entirely under the poster's control, it is perfectly possible to publish a link to a genuine news-story but set the preview card so that it conveys an alternative narrative. As users re-share based on the story in the card, the misinformation spreads.

This is easier to achieve when the fake narrative can somehow be related to the underlying article, because, if the link is clicked (but the article not properly read), the real story can help lend credibility to the rich snippet, effectively weaponising light scrutiny.

For example, if we take this recent story and post it with a card populated with our own narrative:

Screenshot of post on Bluesky. The preview card links to the BBC news article linked above, but the headline in the preview is Trump pleads not guilty to document counterfeiting charges. The description reads Donald Trump pleads not guilty to charges of attempting to create a counterfeit copy of the US constitution. The preview image shows crayon scribbled on paper

The only thing in the preview card that's actually true or accurate is the link.

However, if a reader clicks the card, they'll see the real headline, which doesn't entirely contradict what was claimed in the preview:

Screenshot of BBC News Headline, reads: Donald Trump and Walt Nauta plead not guilty to latest charges in documents case

If (as is statistically likely) they read no further than this, they may leave convinced that the details in the preview card are accurate and that the BBC is, in fact, reporting that Donald Trump tried to create a new fake constitution using crayons. That may be quite useful if the intent is to discredit the BBC or, of course, further discredit Trump.

Incidentally, the credibility of such attempts is only heightened by the fact that news organisations (including the BBC) have developed a habit of using thumbnail images that aren't then visible in the story itself. As a result, users are unlikely to be surprised when a quick scan of the page doesn't reveal the image that they clicked on.


Link Facets

Although the name is perhaps a bit abstract, facets are actually fairly straightforward: they point to a range of characters within the post text, define them as a type (in this case, a link) and provide attributes:

facet = {
        "index": {
            "byteStart": 2,
            "byteEnd": 3
        },
        "features": [{
            "$type": "app.bsky.richtext.facet#link",
            "uri": "https://www.example.com"
        }]
    }

This would turn the second and third characters in the post text to a single link to www.example.com.

So, essentially, this feature enables us to control both the anchor text and the link destination as if we were writing the post in pure HTML, something that, outside of social media, is often abused in Phishing emails.

We can demonstrate that the network doesn't protect against this by publishing a post with an entirely misleading link in it:

text = (
    "Please ignore this post. "
    "the link below does not point where it seems to claim to"
    " https://www.bbc.co.uk/news/articles/crgk4ky17lwo"
)

# Set up the client
BSKY_CLIENT = Client()
profile = BSKY_CLIENT.login(BSKY_USER, BSKY_PASS)

def generate_facets_from_links_in_text(text):
    ''' Based on logic in
        https://github.com/GanWeaving/social-cross-post/blob/main/helpers.py

        Generate atproto facets for each URL in the text
    '''
    facets = []
    for match in URL_PATTERN.finditer(text):
        facets.append(gen_link(*match.span(), match.group(0)))
    return facets

def gen_link(start, end, uri):
    return {
        "index": {
            "byteStart": start,
            "byteEnd": end
        },
        "features": [{
            "$type": "app.bsky.richtext.facet#link",
            "uri": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
        }]
    }

# create the links 
facets = generate_facets_from_links_in_text(text)

# Post      
BSKY_CLIENT.com.atproto.repo.create_record(
    models.ComAtprotoRepoCreateRecord.Data(
        repo=BSKY_CLIENT.me.did,
        collection='app.bsky.feed.post',
        record=models.AppBskyFeedPost.Main(
        createdAt=datetime.now().isoformat(), 
        text=text, 
        facets=facets
        ),
    )
)   

The resulting post looks like this

If you want, you can try it for yourself.

Of course, it's not just rick-rolling: having the ability to override a link's destination means that it's absolutely trivial to post something containing a link reading hsbc.co.uk but instead pointing to a phishing page at myevilsite.example.com.


Conclusion

BlueSky's choice to let the user define the preview card is particularly harmful because it's a step away from the behaviour that users are accustomed to: most social media networks automatically fetch and generate the cards themselves. As a result, most users are not going to expect that they need to trust the card content as being anything but genuine, lending additional credibility to the story being presented.

Unfortunately, any real world misuse of preview cards will be much more plausible and subtle than my examples: it often doesn't actually take very much, precisely because people aren't clicking, let alone reading the stories themselves.

Really, BlueSky (and anyone else implementing the AT Protocol) should ignore provided preview cards and fetch their own so that it's not quite so easy for bad actors to try and fool their users. Unfortunately, although this has been suggested to them, they seem to have rejected the idea on philosophical grounds:

Our approach to misleading and confusing content is reporting and labeling. This is flexible to any kind of content, including misleading websites or screenshots, which wouldn't be "fixed" by having the PDS fetch embeds.

Although this seems to be in line with their approach to moderation, in my opinion, this analysis of the report is flawed:

There is a huge difference between linking to your own, unknown, site (which has very low inherent credibility) and linking to a well-known and/or well-respected site in order to piggy-back off the credibility that brand recognition provides. Linking to a "misleading website" simply isn't the same, the point is that you can link to legitimate websites in a misleading manner.

BlueSky aren't alone in letting users create custom hyperlinks (Facebook do it to), but the ability to do so does increase the risk of things like phishing. The ability to use a facet to make anURL link to another URL is particularly problematic in this respect, especially if then combined with a misleading preview card.

Without wanting to sound overly harsh about it, it's a little hard to escape the feeling that ATProtocol takes years of good practice and turns it entirely on its head, with users paying the price in safety and security.