The Importance of Human Oversight

A few days ago, news broke that $34,000,000 had become forever irretrievable as the result of a bug in a smart contract created by AkuDreams.

It's a good example of an issue that's pervasive across multiple industries, rather than being something that's cryptocurrency specific (though I'm no fan of crytocurrencies)

The reason that $34m in ethereum has been lost isn't simply because someone made a mistake in a smart contract: that's just the inevitable result of an underlying naivety in the smart contract model itself. As we shall see, that same naivety is all too common in the wider technology industry too.


Understanding the mistake

First, let's take a look at AkuDream's mistake.

If we look at the contract the function we're interested in first is processRefunds()

function processRefunds() external {
      require(block.timestamp > expiresAt, "Auction still in progress");
      uint256 _refundProgress = refundProgress;
      uint256 _bidIndex = bidIndex;
      require(_refundProgress < _bidIndex, "Refunds already processed");

      uint256 gasUsed;
      uint256 gasLeft = gasleft();
      uint256 price = getPrice();

      for (uint256 i=_refundProgress; gasUsed < 5000000 && i < _bidIndex; i++) {
          bids memory bidData = allBids[i];
          if (bidData.finalProcess == 0) {
            uint256 refund = (bidData.price - price) * bidData.bidsPlaced;
            uint256 passes = mintPassOwner[bidData.bidder];
            if (passes > 0) {
                refund += mintPassDiscount * (bidData.bidsPlaced < passes ? bidData.bidsPlaced : passes);
            }
            allBids[i].finalProcess = 1;
            if (refund > 0) {
                (bool sent, ) = bidData.bidder.call{value: refund}("");
                require(sent, "Failed to refund bidder");
            }
          }

          gasUsed += gasLeft - gasleft();
          gasLeft = gasleft();
          _refundProgress++;
      }

      refundProgress = _refundProgress;
}

At first glance this may look like it should be OK, but the mistake lies here:

allBids[i].finalProcess = 1;

The value of that attribute is checked in function emergencyWithdraw()

require(bidData.finalProcess == 0, "Refund already processed");

Refunds cannot be processed if bidData.finalProcess is not 0, leaving users unable to withdraw their money.

But, what about the AkuDreams team?

Computer says no

They're locked out because function claimProjectFunds() checks something else:

require(refundProgress >= totalBids, "Refunds not yet processed");

This requires that refundProgress be higher than totalBids. Unfortunately, when they wrote the contract, the developers failed to consider that some buyers might mint multiple NFTs within a single bid - as a result totalBids is stuck at a higher number than refundProgress can reach.

It was avoidable, but it's also easy to see how it could have been missed.

It's extremely unfortunate for AkuDreams, and their bidders, but it's not the true root cause of the issue.


The real problem with smart contracts

There's a phrase often associated with smart contracts

Code is law

What this means is that the code implementation of the contract is the authority on what the contract is supposed to do.

It doesn't matter what the parties to the contract intended or even agreed to, what matters is what the code does: whatever bugs exist in the code become part of the contract.

It's been suggested, by some, that smart contracts might even be able to replace parts of the (civil) legal system.

The problem with that as an idea lies in the statement below

all code has bugs, no software is ever perfect and nor is it ever likely to be.

If this seems unnecessarily pessimistic, or you feel it doesn't apply to your own golden code, consider that to be bug-free means that there are no mistakes, oversights or ... ahem ... undocumented features in

  • the code you wrote
  • any library you explictly rely on
  • any library that those libraries rely on
  • the compiler you used (or the interpeter the code is running in) and it's dependencies
  • the client that the user is using
  • the OS that the code is running upon
  • the hardware the code is running upon
  • any hardware that the code interacts with

This is a pretty tall order for any software.

A system that deliberately excludes the possibility of a human intervening is one destined to have serious issues.


Not an isolated incident

With cryptocurrency being a form of financial system, you might think that mistakes like this are rare, but actually mistakes in and around cryptocurrency are really no more unusual than in other software.

In fact, AkuDream's issue only comes about a week after a $500K NFT was lost because setApprovalForAll could be triggered by javascript embedded within a SVG.

Back in December $31 million was lost as the result of a bug being exploited in software used to write smart contracts.

Of course, it could be argued that these are the result of inexperienced companies dabbling with cryptocurrency when they're not sufficiently familiar with it's mechanisms.

Which would be fine, until with consider what happened with "The DAO".


The DAO

Back in 2016, a group of developers for the cryptocurrency/blockchain Ethereum built The DAO: a distributed autonomous organisation for crowdfunding via smart contract.

Users would exchange Ethereum (ETH) for tokens which enabled them to bid on which projects should be funded - the DAO was itself a smart contract.

The terms and conditions of The DAO made it clear that the smart contracts were the ultimate arbiter (code is law).

The DAO quickly raised around $150m of ETH, initially lauded as a great success.

Within weeks of launch though, someone found a way to siphon funds out of the DAO and took around $70m (3.6 million ETH). The attacker could have taken more, but stopped their attack (for whatever reason).

Although it's tempting to characterise this as theft, it was achieved by using legitimate (i.e. valid within The DAO's rules) smart contracts and therefore was legal under the DAOs T&Cs.

The attacker was able to exploit a couple of mistakes that The DAO's developers had made

  • The developers did not account for the possibility of a recursive call
  • The contract sent funds and updated balance afterwards (allowing the attacker time to use the same token over and over)

This, combined with the complicated way in which "refunds" were processed, allowed the attacker to draw funds out of the DAO into a child DAO that they controlled.

Because of the "terms" in the smart contract used for the DAO, the stolen funds couldn't be withdrawn by the attacker for 28 days.

The community decided that action needed to be taken in order to stop the attacker from being able to actually withdraw the money, so a fork of the blockchain was proposed.

Initially, the plan was a soft-fork, but serious bugs were found in the smart-contract which would have implemented that.

So, although it was contested by some, a hard fork of the Etherum blockchain occurred on 20th Jul 2016.

The entire Ethereum blockchain had to be forked because of bugs in a single (albeit large) smart contract.

That's bad enough, but imagine the harm that could have been done had bugs not been identified in the soft-fork contract before it went live. Many things have been learnt and changed in the years since that fork, but smart contracts continue to be immutable.


Human oversight should always be possible

Whilst the concept of a smart contract is pretty mature (although we tend to associate it with cryptocurrencies, the concept of a smart contract goes back to the 90s), they tend to rely on novel code: code that's been crafted specially to implement the intent of that specific contract.

The idea that any novel code should be given total authority over the transfer of large sums is naive.

At it's core, a smart contract is just code, and all code has bugs. The higher the sums involved, the more motivated someone is going to be to find issues, even before the risk of locking yourself and users out.

Creating code which does something irrevocable (blockchain forks not withstanding) involves putting a lot of trust in programmers

Trusting a computer programmer

At just 634 lines long, AkuDream's smart contract is relatively simple, but that was more than enough to introduce a mistake which has locked millions away.

It's not just cryptocurrencies that we ought to be concerned about though, we should also consider the length and complexity of other code entrusted with much more complex and important decisions.

How many obscure bugs are lurking in the codebase that will one day drive your car while you watch TV, haul freight, or perform surgery on you?

Sadly, bugs in self-driving cars have already led to deaths, with suggestions of "glitches" also being involved in other deaths.

Those deaths occurred with vehicles where a human had the ability to take over, but was unprepared to do so (whether out of a false sense of security, or general distraction). In car terms, a smart contract is more like a car with the controls removed - it takes you where the software says to take you, and if you disagree, tough.

It's not just cars that are being "enhanced" with AI, autonomous military drones may have made their first fully automated kills. These drones do everything from target identification and acquisition to making the kill/no-kill decision - effectively a smart contract with explosives on board.

Whether it's cryptocurrency, transport, weapons or surgery, code is already being entrusted with the responsibility to fully oversee something that has serious consequences, up to and including death.


Humans Make Bugs Happen

At the risk of repeating myself: it doesn't matter what your budget is, who your customer is or what language you've used - all code of any complexity has bugs. Sometimes those bugs are small and unimportant, sometimes they're big and important.

It's a fact, developers make mistakes. Sometimes the mistake is simply categorising a bug as being small and unimportant (so deprioritising the fix) because they didn't see how it could be triggered to cause something much more serious.

It doesn't matter how big your budget is, or how well you pay, bugs will still happen.

Consider the F-35 Joint Strike Fighter

F-35

It's a stunning piece of kit with mountains of funding behind it's development, but was still found to be a buggy mess.

A year after that report another was written noting that the project's software development processes were still problematic

The program routinely underestimated the amount of work needed to develop Block 4 capabilities, which has resulted in delays, and has not reflected historical performance into its remaining work schedule.

Another year later and reports emerged that the software driving the F-35's radar became unstable in flight.

The F-35 program had billions of dollars behind it, but still struggled to develop and deliver working code, even after measures were put in place to try and improve practices.

It's not just developers that make mistakes, project/product managers, accountants and upper management do it too.

USS Independence - LCS-2

One of my favourite examples is the USS Independence from the Littoral Combat Ship class.

The Independence was less than two years old when she had to be laid up in dry-dock to have chunks of her hull replaced because of electrolysis between the aluminium hull and the steel propshafts.

That electrolysis occurred because, in order to reduce costs, the Cathodic Protection System had been axed from the ship's bill of materials. The result was that the US Navy sent a warship out to sea (you know, that big, famously salty body of water), in effect, without corrosion prevention.

It's easy, even tempting, to shrug these off as being typical of public sector projects, but the private sector suffers from similar issues.

Self-Driving Cars

Elon Musk, the man spending $43 billion buying Twitter has complained that LIDAR sensors are an expensive crutch, and Tesla still doesn't include LIDAR in their cars, despite being one of the only self-driving cars to lack it.

Tesla, of course, have long aimed to be first to market with "Full Self Drive" capabilites, based around a reliance on cameras/vision and a heavy dose of AI. The result is cars on the road which lack certain sensors, in part because of cost.

The Uber self-driving car death had it's own story around cost cutting and commercial decisions too

  • Uber had set an internal goal of allowing customers in Phoenix to ride in driverless cars by year end
  • In pursuit of that goal, they apparently cut some corners and under-invested in their software simulation program
  • The car's software identified 49-year old Elaine Herzberg 6 seconds before hitting (and killing) her, but didn't have access to the Volvo's emergency brake because Uber wanted to ensure a smoother ride
  • Management were warned that the cars were routinely in accidents, including shortly before the incident

Apparently Uber's prototypes had been more prone to using the emergency brake than their competitors, giving an erratic and uncomfortable ride. Rather than identify and address the cause of this, they simply prevented the software from being able to trigger an emergency stop.

It should be clear then, that although we talk about bugs (and blame developers), mistakes actually happen across the management chain. An old boss used to describe true root cause analysis quite well

  • The application crashed, why?
  • There was a bug in the code. Why?
  • A developer made a mistake. Why?
  • He was overloaded and didn't check properly. Why?
  • Because the product needed to be delivered quickly and he was the only team-member assigned. Why?
  • Because sales had promised the customer it'd be delivered this week and cheaply. Why?
  • Because a bid had been made to outcompete a competitor, and they'd underestimated the work. Why?
  • Because management had told them it was a must-win. Why?
  • Because the board wanted to see increased revenues. Why?
  • Because share-holders wanted to see growth

If you start from the bottom, you can cover any item in this list and the ones above can still happen - the art of preventing recurrences is in identifying where issues actually occurred rather than simply blaming an overworked developer.

Mistakes made at any level have the potential to impact the final product - the USS Independence was impacted by someone moving items they didn't understand on a spreadsheet to reduce cost, whilst the F-35 project suffered from mistakes made across the board.

Any project with humans involved is ultimately going to need other humans to correct mistakes made by the earlier group. Even once we reach the point that AI is able to write complex software for us, mistakes in the AI's implementation will likely bubble up into it's output periodically.


Conclusion

I've mentioned a few very different projects in this post, but they've all got a few things in common. Each one has:

  • been surrounded by hype: Elon Musk (and others) claimed that our roads would have robot taxis and autonomous cars driving on them within a couple of years, the F-35 was heavily hyped, and people continue to claim that cryptocurrencies and smart contracts are the future.
  • relied on software making decisions, usually significantly reducing (if not outright removing) human oversight at the same time
  • suffered from bugs in their software or it's dependencies

Bugs in software aren't the big issue here. They might be annoying, but they are a fact of modern life. What makes them truly dangerous is pretending that software should ever be treated as infallible.

Serious issues occur when we claim that bugs don't happen, designing and building systems which routinely cut humans out of the decision making loop. This isn't an argument against automation (I'm quite happy to play around with Home Automation), but one against excessive autonomy. It's one thing to delegate routine operations to a machine, but quite another to cede all control to it.

It's not that humans are incapable of making flawed decisions, in fact it's quite the opposite. Software developers and project managers are human and so are perfectly capable of making decisions which fatally undermine your "infallible" product resulting in real-world consequences. Deaths have already occurred because a human made a mistake in software or prioritised the commercial imperative.

There are techniques that can be used to try and reduce the likelihood of bugs - pair programming, peer review - but they're far from a panacea. Any measure that could come close would be so expensive or inconvenient that either no-one would use it when drafting a smart contract or (if it were required) no-one would use smart contracts.

AkuDreams made a rudimentary mistake when creating a smart contract, but the entire concept of a smart contract is built around the mistaken idea that a machine can safely be the ultimate arbiter despite running code written by an extremely fallible human.

It's an expensive mistake, but one that pales in comparison to the potential consequences of building fully automated vehicles and weapons around the same flawed hypothesis.

Rendering a human unable to intervene should be considered a bug, not a feature.


2022-05 Update

The American Automobile Association (AAA) ran a test of three models of autonomous cars and found that they hit a third of cyclists, and 100% of oncoming cars during tests.

We're still a very, very, very long way from full-self drive.