Building an Archive of my Twitter Activity

Ben Tasker

2022-11-06 12:17 (updated 2022-11-14 08:51)

For better or worse, I've been a Twitter user since March 2010.

Whilst I don't claim to have tweeted much of real consequence in those twelve and a half years, it's still quite possible that I'll one day want to reference (if I don't already) some of that activity.

In the past, I've written about the need to screenshot rather than embed social media posts, in part to avoid being reliant on the continued good-will and existence of the relevant social network.

When I wrote that post, it didn't really feel like there was any real possibility that Twitter might one-day disappear.

But then, we probably all felt the same about platforms like Friendster, LiveJournal, Geocities and Myspace. Some of those still exist, but only as a tiny shadow of their former selves.

Over the past 24 hours or so, I've built an archive of my tweets. This post will talk about how as well as a bit more on why.

Twitter's year (so far)

Things have definitely taken something of a negative turn for Twitter this year.

In April, Elon Musk made an unsolicited offer to buy Twitter for $40bn.
In July, he tried to back out, despite a term in the contract he created imposing a $1bn penalty for doing so
He then faced legal action from Twitter under the terms of his own purchase agreement.
In October, Musk completed the $44bn purchase and took control of Twitter.

To help finance the buy-out, Musk took out $13bn worth of debt, which Twitter is now on the hook to repay. So, Musk needs Twitter (and it's userbase) to start paying enough to cover $1bn a year in interest payments in order to service this debt.

Simply put, Musk needs the user-base to pay for the over-the-odds purchase price that he offered.

Things have only really gone downhill since the acquisition completed in October.

statements from Musk appear to have driven advertisers away. Advertising was 90% of revenue.
layoffs and changes to verification increase the likelihood of Twitter being used to disseminate disinformation
Suggestions of changes to algorithms to suppress tweets from non-paying users
Even before changes to content moderation, Musk's public statements emboldened users to the point that use of the 'N' word on Twitter increased by 500%
Because of the speed at which employees were laid off, mistakes were made and Twitter is asking dozens of employees back after finding they were needed after all

Musk seems to have been taken somewhat by surprise by the idea that advertisers might not want their logos appearing above racist comments, and reports are that Twitter has sold almost none of it's 2023 advertising space because of concerns in this area.

Musk has not responded to this particularly well

Twitter has had a massive drop in revenue, due to activist groups pressuring advertisers, even though nothing has changed with content moderation and we did everything we could to appease the activists. Extremely messed up! They’re trying to destroy free speech in America.

One of those advertisers appeared and replied:

Elon, Great chat yesterday, As you heard overwhelmingly from senior advertisers on the call, the issue concerning us all is content moderation and its impact on BRAND SAFETY/SUITABILITY. You say you’re committed to moderation, but you just laid off 75% of the moderation team!

Advertisers are not being manipulated by activist groups, they are being compelled by established principles around the types of companies they can do business with. These principles include an assessment of the platforms commitment to brand safety and suitability.

That includes the trustworthiness of the leadership team and the behavior of the CEO.

You claimed yesterday that you are deeply committed to Content Moderation, yet today you’ve eliminated the vast majority of people who did that work for Twitter.

How do we reconcile these?

The response from the world's richest man, and new head of Twitter?

So for all the replies I received that content moderation = denial of freedom of speech (it doesn’t),what do you say about the fact that the “chief twit” just blocked me for exercising mine? Yesterday, @elonmusk solicited ?s from marketers, today he’s blocking those who ask them.

It'd be easy to give Musk the benefit of the doubt and write this off as an unusual interaction, but Musk later compounded this by posting a tweet threatening a "thermonuclear name & shame" of advertisers who back out of Twitter.

Thank you. A thermonuclear name & shame is exactly what will happen if this continues.

With this year's focus on Russia's illegal invasion of Ukraine there's been quite a lot of talk about how Russia's military has revealed itself to be something of a paper tiger.

It looks like there's a very strong possiblity that the world's richest man is about to have something similar happen to the general perception of his business acumen and (particularly) leadership skills.

Musk's other businesses (Tesla, SpaceX etc) operate in very different markets to Twitter, and don't involve selling advertising space or social interactions with the general public. So, it's entirely understandable that he might not be familiar with the concerns of advertising companies, unfortunately he seems unaware of the adage that you don't always know what you don't know, and has dived in head first.

His future reputation with financiers is unlikely to be helped by the fact that banks are saying they now don't feel able to sell the debt on to investors.

The Damage

It's reasonably clear that Musk doesn't actually understand who derives the most benefit from the existence of verification ticks, because it's not the verified users themselves:

helps ordinary users identify legitimate news sources, for example, reducing the success of Russian disinformation attempts around the war in Ukraine
helps ordinary users verify they're communicating with the company they think they are (reducing the likelihood of scams - such as those a while back pretending to be Elon Musk himself)
helps protect Twitter from liability - blue ticks came about after Kanye West and Tony La Russa complained about accounts being run by impersonators

The extension of blue-ticks to anyone willing to pay $8/month (and significantly less in some regions), undermines all of these. Some might have naively hoped that Musk would be aware how easily disinformation spreads, given that he recently helped spread a fake conspiracy theory.

The manner in which the layoffs were handled displayed his clear disdain for employment law, and the use of Tesla workers to inspect Twitter code is unlikely to please TSLA shareholders or prospective employees.

There's a pervasive rumour going round that those laid off from Twitter were, in part, chosen by ordering people by number of lines of code committed. That's... just really not how code reviews work.

So, in the short time that he's owned Twitter, it looks as though Musk has managed to damage

User trust and safety
Anti-Disinformation efforts
Twitter's shield against liability
The willingness of high-skilled employees to work with him
(Potentially) Shareholder good will
His own reputation

Inevitably, discussion on Twitter of Musk's changes has also taken a turn

Elon Musk, Weird Nerds and Valid Criticism

The reality distortion field is in full effect, and some are very quick to fight Musk's corner with some truly odd arguments.

Reasons For The Archive

Needless to say, I'm not feeling overly optimistic about Twitter's future.

Twitter isn't going to magically disappear overnight, but it does seem to be set on a path towards being a haven for objectionable content, assuming it can somehow avoid oblivion (with $13bn in debt hung around it's neck, it's hard to see how).

I sometimes link out to my own tweets for context and I'm not exactly excited at the prospect that those links may one-day lead to a site full of misogny, racism and misinformation.

I also have certain DataHoarder tendencies.

A while back, I put some effort into republishing some years-old posts, as well as eventually finding and restoring an archive of benscomputer.no-ip.org.

It's not that these old posts necessarily contribute anything to the wider world (in fact, the writing style in some of them is atroscious), but they are part of the journey that I've been on. It turns out that I like to preserve that stuff (I wish I'd cared about it more when I was younger: there's definitely content that's irrecoverable now). Plus, there's real pleasure in finding random stuff that you'd forgotten you'd done.

Creating the Archive

I could (and maybe should) have looked at creating tooling to consume Twitter's data export, however I already had some tooling that I'd used previously to import Tweets into InfluxDB in order to play around, and do some analysis.

That tooling was built around a .Net tool called Twitter-Dump: I created a Dockerised version to avoid needing to install C# on my system.

The utility uses Twitter's search functionality in order to find and retrieve all of your tweets.

It relies on the API's used by the web client, because Twitter's official API will only allow you to retrieve the most recent 3200 tweets.

Even if using the official API was an option, I have other objections to getting an API key, because it requires a verified phone number.

Usage is simple

docker run --rm  -it -v $PWD:/output/ bentasker12/docker-twitter-dump bentasker

The tool then gives you some instructions so that it can authenticate with Twitter

Steps to authenticate:
Step 1: With Chrome, authenticate with Twitter and then navigate to: https://twitter.com/search
Step 2: Open Chrome developer tools
Step 3: Open the Network tab on the developer tools
Step 4: Filter requests for "adaptive.json"
Step 5: Search for anything (doesn't matter)
Step 6: Scroll down until a network request for "adapative.json" is made
Step 7: Right click the request and click "Copy -> Copy as cURL"
Step 8: Paste the contents of your clipboard below

And, a little while later, you have a JSON file containing your tweets

{
  "query": "(from:bentasker)",
  "tweets": [
    {
      "url": "https://twitter.com/bentasker/status/1588205974657048581",
      "id": 1588205974657048581,
      "created_at": "2022-11-03T16:26:26+00:00",
      "full_text": "@JimSycurity @IanColdwater The \"liquid\" part of this is *very* important.\n\nYes, you might see more growth in stocks/shares, but you're most likely to need that emergency fund while the market is down. It's a safety net, not an investment pot, and shouldn't be in capital-at-risk vehicles.",
      "user_id": 124810735
    },

    ],
 "users": [
    {
      "id": 166282004,
      "name": "Scott Helme",
      "screen_name": "Scott_Helme"
    },
    {
      "id": 124810735,
      "name": "Ben Tasker",
      "screen_name": "bentasker"
    },

    ]
}

I then took my earlier analysis script and adjusted it to build a HTML archive, a copy is available at https://github.com/bentasker/twitter_archive_build.

pip3 install dominate
./build_mirror.py bentasker.json

And with that, the archive is built.

Each tweet has it's own page, which looks a little like this

An example tweet

Tweets are also shown in order on per-year archive pages

Archive of tweets for 2022

The script does have some known limitations.

There's no threading, so it's not immediately apparent that a tweet is a reply (the base JSON doesn't have anything which can be used to infer this).
There are no re-tweets: there doesn't seem to be a good way, in search to identify retweets (because from is set to whichever user wrote the tweet you're re-tweeting).

The biggest limitation though is around media, because Twitter heavily obfuscates media URLs in the front-end, it's not (easily) possible to extract image URLs for local mirroring.

As a result, only Tweet text is mirrored.

This also affects avatars, however, because tweets look really weird without an associated avatar, I built some basic support in:

A directory called avatar will be created
If you put a JPG in there, using the user's handle as the filename (e.g. bentasker.jpg) then it will be used/displayed in the archive (there's an onload event on the anchor which checks if the image loaded, and if it does removes a display:none).

The archive itself is available at https://twitter-archive.bentasker.co.uk/

Archive Stats as at 06 Nov 22

My intention is to periodically refresh it in order to keep it up to date, although with all the changes going on, I expect the mirror script will stop working at some point.

In the near future, I'll update pages so that links to tweets point to the archive rather than direct to Twitter.

The Future

Lots of people seem to be moving over to using Mastodon (230k in the last week apparently).

My account is @bentasker@mastodon.social (UPDATE: now ben@mastodon.bentasker.co.uk), but although I'm using Mastodon, I intend to keep using Twitter for the time being.

Someone else put it better than I could

@ambernoelle: I feel like Gen X will never leave Twitter because we’re used to every online space we inhabit eventually burning down, and we’ve developed a morbid and detached curiosity by now, like people who go to funerals for fun

I am shifting to primarily using Mastodon, but will also stick around on Twitter in order to watch it burn.

It really is a shame, as much as it's a hellsite at times, I've enjoyed my time on Twitter: I've met people that I otherwise wouldn't have, and there're posts on my site that wouldn't exist without Twitter prompting it.

After all, on what other site would you get opportunity to give a sitting POTUS reading lessons?

Teaching Trump to Read