Writing (and backdooring) a ChaCha20 based CSPRNG

Recently I've been playing around with the generation of random numbers.

Although it's not quite ready yet, once of the things I've built is a source of (hopefully) random data. The writeup on that will come later.

But, as an interesting distraction (and in some ways, the natural extension) is to then create a Psuedo Random Number Generator (PRNG) seeded by data from that random source.

I wanted it to be (in principle) Cryptographically Secure (i.e. so we're creating a CSPRNG). In practice it isn't really (we'll explore why later in this post). I also wanted to implement what Bernstein calls "Fast Key Erasure" along with some techniques discussed by Amazon in relation to their S2N implementation.

In this post I'll be detailing how my RNG works, as well as at looking at what each of those techniques do to the numbers being generated.

I'm not a cryptographer, so I'm going to try and keep this relatively light-touch, if only to try and avoid highlighting my own ignorance too much. Although this post (as a whole) has turned out to be quite long, hopefully the individual sections are relatively easy to follow

 


 

The Underlying Mechanism

The mechanism itself is actually relatively simple, the basic principle of the PRNG works by:

  1. taking 512 bits of random data from our source
  2. deriving a 256 bit (i.e. 32 byte) key and a seed input from that data (seed input is the full 64 bytes)
  3. Using ChaCha20 to encrypt the seed input with the key and a nonce of 000000000001
  4. Pushing the output to a buffer
  5. Re-encrypting that output with the same key, using a nonce of 000000000002
  6. Pushing the output to a buffer
  7. Repeating steps 5 & 6 until we've iterated 48 times, increasing the nonce by 1 each time

The iteration section of the mechanism looks like this

# 48 iterations
for i in range(1,itercount):
    
    # Use counter-mode to generate our nonce for each encryption
    #
    # When this iteration loop is next called, the key will have changed
    nonce=format(i,'012').encode('utf-8')
    
    # Trigger the encryption
    plaintext = ChaChaMe(key,nonce,plaintext)
    
    buffer1.append(plaintext)

Because our input was 64 bytes, buffer1 now contains (48 blocks*64 bytes) 3072 bytes of random data.

We take the first 2 blocks - 128 bytes of data - and

  • XOR them together
  • Take the first 32 bytes to create the key for the next iteration.
  • Save the remaining 32 bytes for later (this will be used for Back-Tracking protection)

With the exception of the very final block, the remaining bytes within our buffer are then concatenated and pushed onto the output queue as our random data

    key,spare=select_key_from_bytes(buffer1[0],buffer1[1])
    
    # Clear some space on the queue if necessary
    if data_queue.full():
        d = data_queue.get()
        del d
    
    # use the rest of the chain as our bytes
    # we did 48 iterations, and are using 2 for a key, leaving
    # 46 * 64bytes being pushed into the queue 
    data_queue.put(b"".join(buffer1[2:-1]))
    plaintext=buffer1[-1]

The very final block becomes input for our next iteration.

Within this iteration loop, a time based counter is checked to see when we last re-seeded by reading random bytes from our random source. If that interval has elapsed and there are bytes available (they're read in by a different thread to avoid blocking) then we'll re-seed with that

    if (time.time() - start) > reseed_interval and seed_queue.qsize() > 0:
        try:
            newseed = seed_queue.get(True,0.1)
            if newseed:
                key,plaintext = split_seed(newseed)
                start = time.time()
        except:
                print("{} unable to read a seed".format(time.time()))
                pass

We carry on using the previous bytes if we cannot fetch a new seed - the whole point in this PRNG, after all, is that it's not supposed to block.

This process is relatively fast, though the script actually launches a couple of threads (using different seeds) to speed it up a bit further. /dev/urandom is still much faster - that's partly because I haven't put sufficient thought into how to have the pipe handle larger/long reads, but we'll come back to that later.

 


 

Fast Key Erasure

Bernstein wrote about Fast Key Erasure back in 2017.

The aim being to try and address situations like the following

  • You start your CSPRNG
  • You use the output to generate a key, then encrypt something interesting with it
  • You securely remove the key and plaintext from the computer
  • An attacker gets access to your machine

In principle, you'd hope that the attacker would not be able to figure out the key used to encrypt the file. However, they now have access to your machine, and by extension, the RAM.

Because the key used as input to your CSPRNG is still in memory, they can use this and current output to "back-track". Essentially reading a number back from your RNG, decrypting that with the key, then decrypting that etc.

Using this attack, they may be able to discern the numbers used to generate your key, and ultimately decrypt your encrypted file.

There are two terms that apply here - "Forward Secrecy" and "Backtracking resistance". You normally hear the former in relation to keys and the latter in relation to Random Number Generators, but not always.

So, the approach I've implemented, roughly based on that described by Bernstein (he uses AES-256-CTR in his examples) is

  • Start with 256 bit key k and 512-bits input i
  • Generate 48 blocks (b[0]-b[47]) using k incrementing the nonce by 1 each time
  • Overwrite k with a combination of the first 2 blocks (b[0] ^ b[1])
  • Use blocks b[2] - b[46] as RNG output, erasing each as soon as it is consumed
  • Overwrite i with b[47]
  • Start over using the new values of k and i

This means that every 48 iterations we'll create a new key. The numbers used for subsequent keys/input are never output.

 


Prediction Resistance

This is an approach implemented and discussed by Amazon in relation to their TLS implementation S2N.

Although it might, at first, seem a little counter-intuitive, PRNGs are deterministic. That is, if you run the same PRNG repeatedly with the same seed, it'll generate exactly the same output on each subsequent run.

If you read twice from the same PRNG you will, of course, get different numbers because you're reading later in the chain.

To understand Amazon's approach, and why it's needed, we first need to look at PRNGs and determinism

PRNG's are Deterministic

You might have heard the other name for a PRNG - a Deterministic Random Bit Generator (DRBG).

It might seem a little odd that a Psuedo-Random Number Generator would generate predictable and repeatable outcomes, but it's not without good reason.

The aim of PRNG is to take a small amount of (hopefully) high quality randomness and "stretch" it so that you can get lots and lots of randomness out of it. Because of the trust placed in the output, that obviously needs to be done with some care (seems like a good opportunity to say - don't use my CSPRNG for anything you care about).

The system already generates (hopefully) high-quality random bytes by doing clever things by looking at interrupt timings etc, so whilst it might be tempting to try and make CSPRNG output non-deterministic by doing things like:

s=time.time()
do_some_op()
d=time.time()-s
# then mix d into our data

At best, there's very little value in doing so because you're using something the kernel is already using.

However, at worst, you may actually have made your output waeker and more predictable. It might be that your choice of input has:

  • Increased the risk of collision - your RNG may periodically repeat bytes.
  • Opened your RNG to manipulation or observation via a side-channel (in the example above, can an attacker influence the time required for do_some_op to make d have known value?)
  • Done something else

The underlying problem, essentially, is that by making your output non-deterministic based on (potentially) low-quality sources it's extremely difficult to assess where weaknesses in your implementation might lie. You may also be slowing your PRNG down, because high quality sources of randomness tend to be quite slow.

After all, if you had a sufficiently fast source of high-quality randomness (like a hardware TRNG), wouldn't you just be using that instead of a CSPRNG?

 

Amazon's Issue

PRNGs being deterministic can bring an issue though - as Amazon found.

As we know, two PRNGs started with the same seed will generate the same numbers. Now, hopefully, your PRNG reads in a seed at startup, so that shouldn't happen, right?

The problem is, if the process is later forked with fork() an exact copy of that process is created in memory - excellent for scaling up quickly and reliably. But, now you've got two PRNGs using the same seed, and outputting identical chains of bytes.

If the numbers from one are being used publicly - perhaps as bytes in an ICMP packet, or in a TLS Hello - but the number from one are being used privately - to generate a key - then you've got a fairly serious problem. 

As Amazon notes, this can partially be mitigated by having your PRNG have two calls, public() and secret(),using different seeds for each.

But, that still doesn't resolve the underlying issue of numbers being repeated. Amazon's post is actually about the kernel level fix for this - implementing the WIPEONFORK option - but I'm (obviously) more interested in the RNG level defences that are implemented.

So, the approach that's used is to mix in some bytes from a reasonably fast good quality entropy source - the processor's RDRAND instruction (assuming it's got one). 

This means that even with the same seed, two PRNGs should output completely different bytes.

This is implemented within my code as follows

# 48 iterations
for i in range(1,itercount):
    
    # Use counter-mode to generate our nonce for each encryption
    #
    # When this iteration loop is next called, the key will have changed
    nonce=format(i,'012').encode('utf-8')
    
    if prediction_resistant:
        plaintext = mix_with_rand(plaintext)        
    
    # Trigger the encryption
    plaintext = ChaChaMe(key,nonce,plaintext)
    
    buffer1.append(plaintext)

Where mix_with_rand() simply fetches 32 bytes of data and XORs it against plaintext.

Now, you may be thinking that seems contrary to the reasoning given in "PRNG's are Deterministic", and in some ways, it is.

The important factor here is that we're using (what should be) a fast high-quality source of entropy - RDRAND. The value is also mixed in before we apply ChaCha, so even if either plaintext or the RDRAND output were biased, it should still get randomised.

Actually, though, my implementation provides for an insecure fallback if RDRAND isn't available. There's no good reason for this other than that I was doing quite a lot of testing on a Raspberry Pi which doesn't have RDRAND.

 


 

Back-Tracking Protection

When talking about Fast Key Erasure, I made reference to NISTs terminology for Forward Secrecy - backtracking resistance.

To recap, the idea of backtracking protection is to ensure that an attacker who knows your current state cannot take the output from your PRNG and calculate all the numbers that will have been output previously.

Because I implemented Fast Key Erasure for this, the protection was actually already in place - an attacker shouldn't be able to backtrack more than 46 blocks even if they know the current state.

However, the Linux Kernel uses a slightly different approach, so I wanted to build something in based upon that too.

The kernel checks to see whether it has any unused bytes left from the previous output, and if it does, uses those in order to mutate the key for future iterations.

Because each iteration of my script generates 512 bits, and we only need 256 bits for the key, we have the luxury of having 256 bits "spare" - they'd otherwise just be discarded.

Any given key is used 48 times within the script (because we iterate 48 times), so I decided we'd mutate the key half way through:

# To help reduce the efficiency of backtracking, we'll mutate the key 1/2 way through
mutate_point = int(itercount/2)

# 48 iterations
for i in range(1,itercount):
    
    # Use counter-mode to generate our nonce for each encryption
    #
    # When this iteration loop is next called, the key will have changed
    nonce=format(i,'012').encode('utf-8')
    
    if prediction_resistant:
        plaintext = mix_with_rand(plaintext)        
    
    # Trigger the encryption
    plaintext = ChaChaMe(key,nonce,plaintext)
    
    if i == mutate_point and spare:
        # Mutate the key using some of the "spare" data from the last key generation round
        newkey = xor_bytes(key,spare)
        del key
        del spare
        key = newkey
        del newkey
    
    buffer1.append(plaintext)

If you recall the specifics in "Fast Key Erasure", there's another change here. The bytes in buffer1[0], buffer1[1] and buffer1[47] are used to formulate the next key and input. 

However, they now sit on opposite sides of the mutate boundary. Where before buffer1[47] was (indirectly) derived from buffer1[1] (which was derived from buffer1[0]), that's no longer true. I'm not sure though that there's actually any tangible benefit from this dis-association.

What it does mean, though, is that if you can view the current state, you should now only be able to back-track the latter half (i.e 24) values.

 


Re-Seeds

It should go without saying (it almost did, I've had to come back and edit the post to add a section on this), that our CSPRNG should regularly read in new random bytes and re-seed itself.

Doing this means that if an adversary were somehow able to influence the random bytes we receive, they'd have to regularly (or even continuously) do so in order to be able to continue predicting our output.

If, for some reason, we can't read a new seed, we continue using the old one - a PRNG is not supposed to block after all

if (time.time() - start) > reseed_interval and seed_queue.qsize() > 0:
    try:
        newseed = seed_queue.get(True,0.1)
        if newseed:
            key,plaintext = split_seed(newseed)
            start = time.time()
    except:
            print("{} unable to read a seed".format(time.time()))
            pass

 


Tearing it Apart

So with the script now written and functional - you can find a copy of it in  csprng.py, it's time to assess how much cop it actually is.

 

How Random Is it?

I've written previously about the difficulties in assessing how random as set of "random" data actually is.

It's very difficult to say for sure, as the tests are really only indicative

However, ent reports

Entropy = 7.999980 bits per byte.

Optimum compression would reduce the size
of this 9956544 byte file by 0 percent.

Chi square distribution for 9956544 samples is 277.10, and randomly
would exceed this value 16.33 percent of the times.

Arithmetic mean value of data bytes is 127.4985 (127.5 = random).
Monte Carlo value for Pi is 3.141518985 (error 0.00 percent).
Serial correlation coefficient is -0.000076 (totally uncorrelated = 0.0).

so very nearly 8 bits of entropy per byte,

and rngtest reports

rngtest 2-unofficial-mt.14
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rngtest: starting FIPS tests...
rngtest: entropy source exhausted!
rngtest: bits received from input: 79652352
rngtest: FIPS 140-2 successes: 3980
rngtest: FIPS 140-2 failures: 2
rngtest: FIPS 140-2(2001-10-10) Monobit: 0
rngtest: FIPS 140-2(2001-10-10) Poker: 0
rngtest: FIPS 140-2(2001-10-10) Runs: 0
rngtest: FIPS 140-2(2001-10-10) Long run: 2
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=24.675; avg=1475.199; max=9536.743)Mibits/s
rngtest: FIPS tests speed: (min=1.026; avg=16.291; max=16.746)Mibits/s
rngtest: Program run time: 4718285 microseconds

dieharder's report is much, much longer so I won't reproduce it here, but it looks suitable hopeful.

 


Is it actually Cryptographically Secure Though?

This is where things can start to get fun.

In order to qualify as Cryptograhically Secure and attain that all important CS prefix, a PRNG needs to meet some basic requirements:

  • An adversary not knowing the seed should have a negligible advantage in discerning output from a stream of random bytes
  • Must pass a test requirement that it's currently impossible to actually test

The latter is generally addressed by basing the CSPRNG on a problem that's assumed to be computationally difficult, such as factoring an integer etc.

My script uses a stream cipher - ChaCha20 - in order to achieve this. Arguably, most PRNGs should be able to meet the former.

I've somewhat oversimplified the requirements, but on the face of it, my PRNG is potentially Cryptographically Secure.

 

So it's secure?

Errr.... no.

So, in this post I've written quite a lot about providing Forward Secrecy/Backtracking Protection. These protections rely on the idea that you get keys (and output bytes) out of memory as soon as you're done with them, so that they're not available for a later adversary. 

The keener eyed of you, though, may have noticed that I've written this in Python. Python is a high-level language with a Garbage Collector.

As Bitfi found out, it's not possible to be sure that your material is gone from memory in such a language - it may even be that your system's RAM contains a complete history of the keys used. Worse, if pages have been swapped to disk, your disk might even be full of months/weeks of key history.

In the snippets above you'll see I've used the del keyword to attempt to delete bytes from memory.

However, all this actually does is remove that reference to it - if something else is still referring to that area of memory (and honestly, I've not been nearly careful enough in that respect), the Garbage Collector simply won't touch it.

Some readers may be wondering why I haven't tried to "overwrite" the values I want to scrub. They're stored as objects of type bytes, which is an immutable type in Python - you don't overwrite them in memory, so much as make a copy of it.

To understand this, take a string (str) - also immutable in Python:

a="Hello"
b="World"

a=a+b
''' Now you have 3 strings in memory

Hello
World
Hello World
'''

a=b
'''
Assuming the GC didn't run yet, You still have 3 strings in memory

Hello
World
Hello World

But the variable 'b' now references the memory area containing "Hello" instead
'''

If the garbage collector were to run at the end of this, it'd probably delete World as nothing is now referencing it. 

A search on the net for "secure erase memory python" might fill you with a sense of false hope, but the reality is that Python just does not have low enough control over memory for this. Amongst the search results will be advice suggesting use of mutable types like bytearray - however this is also not sufficient as I detailed in the Bitfi article.

 

Stored bytes

This issue doesn't just affect the keys used in generation, but may very well result in our output bytes remaining in memory for far, far longer than they're supposed to.

They should be erased from memory as soon as they've been written out to pipe, but because we're using Python we can't guarantee that - effectively betraying the trust of our downstream consumer. Not to mention that my haphazard approach to writing to the pipe means we'll block waiting for a reader, so the next bytes we output may have been in memory quite some time.

 

As we'll find out in detail later, there was also a pretty significant bug, meaning that an attacker could predict the input key approximately 50% of the time.

 

I used Python because I was interested in quickly exploring the techniques, but it really isn't the language to be creating a CSPRNG in.

  


Speed

Running with 2 RNG threads and with prediction resistance disabled, the script writes bytes out at around 300kB/s

dd if=/tmp/csprng of=~/rand count=20K
^C3394+351 records in
3569+0 records out
1827328 bytes (1.8 MB, 1.7 MiB) copied, 5.36358 s, 341 kB/s

Which isn't exactly lighting fast compare to the kernel's /dev/urandom (4mB/s).

This is partly a product of how many RNG threads are running (i.e. how quickly we can write bytes into the queue), how big the block is (defined by how many iterations we do per key), and my hatchet job in handling writing to the pipe.

It could certainly be sped up, but in any case is still 10x faster than reading from /dev/random (or indeed my custom source) on the same box, and doesn't block.

 


 

Let's break it

Although I hadn't really intended to, I've put some effort into building this, so it'd be completely remiss of me not to have a swing at backdooring it to see whether it's possible to enable back-tracking whilst still having the back-tracking protection in place (if nothing else it means I can use the phrase "my backdoor's output" whilst trying to keep a straight face).

My main requirement for this was that it shouldn't be possible for an observer watching the output stream that it's been backdoored - so we shouldn't output any additional bytes per block etc.

So, the requirements can be summarised as

  • It should be possible for an attacker consuming bytes to be able to calculate earlier output
  • There should be no discernable change in output - no trailing bytes etc
  • Although I'm not going to try too hard to hide the code changes, they should be as minimal/subtle as possible

For those interested, you can find a copy of the backdoored version and my attack script in the branch silly-backdoor, but we're going to look over the design and changes in this section.

 

The Scheme

The technique I decided to use was to halve the number of byte generation iterations, and make up the deficit with predictable (to me) bytes derived from earlier output.

So, in psuedo-code terms rather than

for i in range(0,48):
    output.put(generate_bytes())

We'd now be doing

for i in range(0,48):
    b = generate_bytes()
    output.put(b)
    output.put(hide_key(b,k))

So that every other block of bytes would tell us what key was used with ChaCha to generate the previous block.

To make up the 64 bytes required for a block, the 32 byte key is concatenated to itself.

It'd be a little bit obvious if every second line were the same though, so to mask it, the key would be XORd against the bytes in the previous block.

Once the attacker has recovered the key, they can "just" decrypt the previous block in order to discern what the bytes before that were (because the input is the previous block).

Put simply, this means that if we read 4 blocks

foo
bar
sed
zoo

Then we can do zoo ^ sed to retrieve the key to decrpypt sed and hopefully arrive at foo.

However, we have no way to know where in the stream we're reading, so the final bytes may be the first part of a pair - so we may also need to try sed ^ bar, and then decrypt bar to get an earlier value (or even, use it to decrypt zoo and get bar).

 


Nonces

Now, it's not actually quite that straight forward, because we need to tell ChaCha20 what the nonce used during encryption was. However, because we know it's derived from the iteration count, it's a tiny keyspace so we can just bruteforce it.

Reverting again to psuedo-code we can essentially do this:

k=(zoo ^ sed)
for nonce in range(1..24)
    if decrypt(sed,k,nonce) == foo
        key=k
        found=True
        break

if not found
    k=(sed ^ bar)
    for nonce in range(1..24)
        if decrypt(sed,k,nonce) == foo
            key=k
            found=True
            break
    
print(k,nonce)

 

This should allow us to walk back to the most recent key mutation (i.e. the bit added in Backtracking). It'd be nice if we could get past that though.

Fortunately the answer is relatively simple - our keys are 32 bytes, but a block is 64 bytes. Rather than concatenating the key to itself, we instead concat it to the previous key.

Once our key stops working, we then try and continue the decryption using the previous key - carrying us all the way back to the proper key rotation.

 


 

The Changes

Obviously it took a bit of playing about to get the changes quite right, but ultimately the changes were as follows

--- a/python/csprng.py
+++ b/python/csprng.py
@@ -87,12 +87,13 @@ def ChaChaMe(key,nonce,plaintext):
     return cipher.encrypt(plaintext)
 
 
-def iterate_with(key,plaintext,itercount,prediction_resistant,spare):
+def iterate_with(key,plaintext,itercount,prediction_resistant,spare,prevkey):
     '''
         Iteratively reencrypt a keyset with itself - itercount iterations
     
     '''
     buffer1 = []
+    itercount = int(itercount/2)
     
     # To help reduce the efficiency of backtracking, we'll mutate the key 1/2 way through
     mutate_point = int(itercount/2)
@@ -108,18 +109,21 @@ def iterate_with(key,plaintext,itercount,prediction_resistant,spare):
         if prediction_resistant:
             plaintext = mix_with_rand(plaintext)        
         
+        
         # Trigger the encryption
         plaintext = ChaChaMe(key,nonce,plaintext)
+        keystr = xor_bytes(key+prevkey,plaintext)
         
         if i == mutate_point and spare:
             # Mutate the key using some of the "spare" data from the last key generation round
             newkey = xor_bytes(key,spare[32:])
-            del key
             del spare
+            prevkey = key
             key = newkey
             del newkey
         
         buffer1.append(plaintext)
+        buffer1.append(keystr)
         
     return buffer1, plaintext
 
@@ -178,14 +182,16 @@ def rng_thread(initial_seed,seed_queue,data_queue,reseed_interval):
     key,plaintext=split_seed(initial_seed)
     start=time.time()
     spare=False
+    prevkey='00000000000000000000000000000000'.encode('utf-8')
     
     while True:
         # Set off the initial iteration (48 iterations)
-        buffer1, plaintext = iterate_with(key,plaintext,48,prediction_resistant,spare)
+        buffer1, plaintext = iterate_with(key,plaintext,48,prediction_resistant,spare,prevkey)
 
         # Clear the original and then use the first 2 entries to create the next key
-        del key
-        key,spare=select_key_from_bytes(buffer1[0],buffer1[1])
+        # backdoor - keep a copy of the previous key
+        prevkey = key
+        key,spare=select_key_from_bytes(buffer1[0],buffer1[2])
         
         # Clear some space on the queue if necessary
         if data_queue.full():
@@ -195,11 +201,11 @@ def rng_thread(initial_seed,seed_queue,data_queue,reseed_interval):
         # use the rest of the chain as our bytes
         # we did 48 iterations, and are using 2 for a key, leaving
         # 46 * 64bytes being pushed into the queue 
-        data_queue.put(b"".join(buffer1[2:-1]))
+        data_queue.put(b"".join(buffer1[2:-2]))
         
         
         # Next plaintext is the last block
-        plaintext=buffer1[-1]
+        plaintext=xor_bytes(buffer1[-1],buffer1[-2])
         
         # Clear the old one out
         del buffer1

Some of this, of course, isn't particularly subtle - there's a function signature change, and that series of 0's stands out like a sore thumb.

To make life a bit easier for myself, I made a couple of extra changes

  • Only use 1 thread (it should be possible to discern between 2 threads and handle it, but that's deeper than I wanted to go)
  • Make the script dump out 128 full blocks to a text file, rather than writing to pipe - basically so I'd have reproducability

Those changes look like this

diff --git a/python/csprng.py b/python/csprng.py
index 7598277..e811e0c 100755
--- a/python/csprng.py
+++ b/python/csprng.py
@@ -71,7 +71,7 @@ seed_source="/tmp/randentropy"
 
 
 # How many threads should we have generating random numbers?
-rng_threads=2
+rng_threads=1
 
 


@@ -349,5 +355,16 @@ for i in range(0,rng_threads):
 print("Starting")
 readthread.start()
 seedthread.start()
-readthread.join()
-seedthread.join()
+#readthread.join()
+#seedthread.join()
+
+# Read out a sequence of bytes (block if necessary) so we can write them to a file for me to then try backtracking with
+
+import base64
+op=os.open("output",os.O_WRONLY)
+for i in range(0,128):
+    os.write(op,bytes(base64.b64encode(data_queue.get())))
+    os.write(op,b"\n")
+
+os.close(op)
+sys.exit()
\ No newline at end of file

The backdoor was now in place, and ent still seemed happy enough with the output

pi@nextcloudpi:~ $ for line in `cat output`; do echo -n "$line" | base64 -d >> opbytes; done
pi@nextcloudpi:~ $ cat opbytes | ent
Entropy = 7.999411 bits per byte.

Optimum compression would reduce the size
of this 344064 byte file by 0 percent.

Chi square distribution for 344064 samples is 280.98, and randomly
would exceed this value 12.66 percent of the times.

Arithmetic mean value of data bytes is 127.3904 (127.5 = random).
Monte Carlo value for Pi is 3.147600446 (error 0.19 percent).
Serial correlation coefficient is 0.000058 (totally uncorrelated = 0.0).

So that should be the "no discernible change" requirement achieved

The next thing to do was to attack the backdoor

I wrote a script to read the last few blocks from that file and then calculate what the previous block should be, before checking whether it was correct - essentially using the attack to calculate blocks, and using the benefit of having a copy of the earlier blocks for verification.

 


 

Found a hidden bug

Running my script reported success

pi@nextcloudpi:~ $ python3 attack.py
Got 128 blocks
2816 greater than 2752. Abort
Key is 32 bytes
Found key b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
Successfully predicted that position -7 would contain b'8EtgDfMB3/x/9a6+455Hys6CdT5dTzhAGC0O5hp4o9morKsz0PG6Si+qja+uv+5NKsjNa0CBeww34tcDYEFg2g==' (b'8EtgDfMB3/x/9a6+455Hys6CdT5dTzhAGC0O5hp4o9morKsz0PG6Si+qja+uv+5NKsjNa0CBeww34tcDYEFg2g==')

But... wait.... why is the key 32 bytes of 0s? Is it always that way?

With the benefit of a couple of print statements it became pretty clear

Key cycled
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
Mutating
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
Key cycled
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
b'2jloroJof475QjrCxnETnTeF2BYx7r2Z3asSy0c7uTg='
Mutating
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='

The key was getting mutated back to a string of zeros.

The reason is that our mutation uses the current key and the "spare" bytes from the block the key originally came from. However, spare isn't actually just the spare bytes but the entire block - so the first 32 bytes are identical to the current key. XOR something against itself, and what do you get? 0.

            # Mutate the key using some of the "spare" data from the last key generation round
-            newkey = xor_bytes(key,spare)
+            newkey = xor_bytes(key,spare[32:])

That's not a bug introduced by the backdoor, that's been there since the outset. Any attacker reading the code and noticing what I'd missed would know that 50% of the time the key will be 16 binary 0s. Ooops. If you're wondering whether that kind of mistake happens in the real world, it absolutely does.

With that fixed, we then got a more realistic key back

pi@nextcloudpi:~ $ wc -l output 
128 output
pi@nextcloudpi:~ $ python3 attack.py
Got 128 blocks
2752 greater than 2688. Abort
Key is 32 bytes
Found key b'tTbFGHGbbW09xbIAQXdpjWRaHv1gkaFqpEm888r5h4s='
Successfully predicted that position -6 would contain b'aqN7KwG3WFeFhqrXEnGpjxKwc0nNI7Zh+/H/asWWq1bDOJzA+1U8Nf62P3sZi5qf9nCNHaSXzTtDyqfXCcJl/g==' (b'aqN7KwG3WFeFhqrXEnGpjxKwc0nNI7Zh+/H/asWWq1bDOJzA+1U8Nf62P3sZi5qf9nCNHaSXzTtDyqfXCcJl/g==')

There was actually a second, similar bug, introduced by the back-door, but for brevity I won't cover that here. I have, however, put a bug report on Github going into much more depth on both of these bugs.

 


Wrapping it in a loop

Having successfully stepped back a block or so to prove the concept, it was time to wrap it all in a loop to see if we could successfully jump the key mutation boundary, so I amended it to try and proceed further, switching to the previous key if the first failed to work.

I also made sure the script would add in the blocks created with bytes ^ key, because clients will of course have consumed those being unaware that they're part of a backdoor.

pi@nextcloudpi:~ $ python3 attack.py 
Got 128 blocks
2752 greater than 2688. Abort
Key is 32 bytes
Found key b'0Txd0acPMZ6ocDXcnUzSQfgun8Xml9UefiYLnf6CQbY='
data position is 2
Successfully predicted that position -6 would contain b'Z54PwVq4m7kb+hBzYHmiJsItYsoaH76+vs/Oy74mZPWquB1s0Krnst6/cvgyh5SXaXj+2KN65F79juJNoeVV6Q==' (b'Z54PwVq4m7kb+hBzYHmiJsItYsoaH76+vs/Oy74mZPWquB1s0Krnst6/cvgyh5SXaXj+2KN65F79juJNoeVV6Q==') nonce is b'000000000021'
Woot
Woot
Woot
Woot
Woot
Woot
Woot
Woot
Ah, was a key rotation
Woot
Woot
Woot
Woot
Woot
Woot
Woot
Woot
Woot
Woot
Reached beginning of block
Key for previous block was b'GTLRjI0PuCo4QQ76Bjk3Ij8mWOBYiE3ew6Icw1xdZiM='
[b'tqJSEP23qieziiWv/TVwZzoD/Q/8iGugwOnFVkCkJUM=', b'Z54PwVq4m7kb+hBzYHmiJsItYsoaH76+vs/Oy74mZPWquB1s0Krnst6/cvgyh5SXaXj+2KN65F79juJNoeVV6Q==', b'5s/550NjgWKJvyVEnRL+IQn5/Y5M1e1YXGQsuW9YcoU=', b'N/OkNuRssPwhzxCYAF4sYPHXYkuqQjhGIkInJJHaMzMoDPBUmR+iyUjlG51ngLMIgQb/mAd+fMm/gAXsrAakDg==', b'2j7Az8Ta2C1ihCHCnBxf+i3rvbhJEBFGeTTi9RhOkLM=', b'CwKdHmPV6bPK9BQeAVCNu9XFIn2vh8RYBxLpaObM0QWDlLxUyL4QSHsCKZrWCV+Jsjl4+cwy8RHdunpSml2Kfg==', b'jXG4yEAX9gmHUpcC3pYxb9D1ajU13V3Bwh7lIJWlhY0=', b'XE3lGecYx5cvIqLeQ9rjLijb9fDTSojfvDjuvWsnxDtzbHGuhkKaqw8DtH/GT2V2Rcn92/rQxYUcvBQ4OqjUCw==', b'u2VfUAy3w+afVT+e83jkAgZVhQH5kLJRNn1bcVfJPx8=', b'alkCgau48ng3JQpCbjQ2Q/57GsQfB2dPSFtQ7KlLfqnNxnOu8jLU64jWA0raRHmV6VBYisi+/wuAQPdXqNafRw==', b'AmBnlspCXsS5cc5NOZpLQhGy49j6Sqg5iTSFZmcuSGc=', b'01w6R21Nb1oRAfuRpNaZA+mcfB0c3X0n9xKO+5msCdEzxVcoSCzTBpEV4WoEzHx0jAy2BKTG54l3+smXXcYMbA==', b'0XvNM5VZf/4UwNBM7N7HCXc9TbDK7Y38C/jN7Xv524c=', b'AEeQ4jJWTmC8sOWQcZIVSI8T0nUseljidd7GcIV7mjEL2XVVu6lcDoyQplLbtWm29ZupIcmn8Ri3CcU8d8AwHA==', b'DWPvoy4f/Fso9KKibFstxXFor7MBZkR/PFAmKPHf+YE=', b'3F+ycokQzcWAhJd+8Rf/hIlGMHbn8ZFhQnYttQ9duDfefiqG1J1JQ3T/tAKpzKg14qcKM/UMZqfbbl9Wa0Fsfw==', b'kPrtYz89rvKtcvwRBwlFdfalAsHhAGEuNzEW7F25kfE=', b'Qcawspgyn2wFAsnNmkWXNA6LnQQHl7QwSRcdcaM70EenRxb3NofNbGaEQHbjfjZlJ3uI2VgIahueT+05z61HLQ==', b'diQN9+1usKtGaiW/BfsE9x6XrPRgMc8oxYJtNM6pYhA=', b'NIADXsh/7Rewcj4IcrU2SHUuTUGBvPtJ+xaxq1e/oO5qlhf6Q87kz57hk259BirnC1OtfzlY6zWT+y/xt7tTvA==', b'9W2MwNeQyJBbEwi6o/znPX8Z9Jx8VMLdXjUuRrCQ9WE=', b't8mCafKBlSytCxMN1LLVghSgFSmd2fa8YKHy2SmGN5+XYpc4LtK48epQDGLoVMZYYh+uiOlT3bnB76GZ0wXIIA==', b'gTUz038+0yKM6snuqsDpPNWFe8ofq3K4Ed3jh9UcYhE=', b'w5E9elovjp568tJZ3Y7bg748mn/+JkbZL0k/GEwKoO+uFLNsB/w1jTFm6wyM1OiQYq2XzBgt4F0T3YvBR21P/w==', b'ioK7l8Aa+aUs2tHgOdnnivE555l4+eiP1LqrYkf3Jb8=', b'yCa1PuULpBnawspXTpfVNZqABiyZdNzu6i53/d7h50H72b36InunLkJb0JleQ9fAz5eKvQsxZkzERiq+0VR2Bw==', b'8ogCM5uHr70Sg+eCUKxa780jJwcFJW19rQKPraoWdT4=', b'sCwMmr6W8gHkm/w1J+JoUKaaxrLkqFkck5ZTMjMAt8B+ZCgJZFGiCP8TokWcaMjG2Jxd8QOiobp/HY81ZFNjDg==', b'1QyL/+wpNlDoLbpgSpUcHLVb9uuMRLpD++XKm8UoaEk=', b'l6iFVsk4a+weNaHXPdsuo97iF15tyY4ixXEWBFw+qrf1Khb+PpmfQvmoqyfEji8GF1/pB2pgCqHTcOxfnrWXww==', b'cTYQOOiFJIGzzNd+kMJK+6yhC7YzMM3Dso2dvWxsQ+k=', b'M5Iekc2UeT1F1MzJ54x4RMcY6gPSvfmijBlBIvV6gRfJY3zcK7axJSCr2pBbpOL1HfMPtXGLyFUFtQl8w+jNEw==', b'hiMGO1VMG3Wmq02HLJZwziK723ZI0FLySsC2zEz6Qfg=', b'xIcIknBdRslQs1YwW9hCcUkCOsOpXWaTdFRqU9XsgwYavLgCbHa5bbQZTgF2jLWj6k06Fodii8A4Ibu4UIjHGg==', b'w5K0VuX6508YkLflUfcwR/5kB1G61XIWAynhqgVa7ys=', b'gTa6/8DruvPuiKxSJrkC+JXd5uRbWEZ3Pb09NZxMLdW8wepZ8pCqdt5p0V04xXAESTtfdtbTpngPLmuUnt4v/Q==', b'u0di/Er4a5y0bgy3mCqaBzY6QPLzdT7y2iX/pstyF4o=', b'+eNsVW/pNiBCdhcA72SouF2DoUcS+AqT5LEjOVJk1XTRJrHaVUzUwgL3IjqfHmiE6973yel50FqRJjh/OiyXNA==']

In the output above it says "block" but that refers to the complete output of an iteration sequence (i.e. the contents of buffer1) - the PRNG collapses them into a single 832 byte block before pushing onto the queue.

By reading 4 blocks (actually, we could have done it with 3), we've successfully recovered 38*64 byte blocks, before running up against the proper key rotation.

Why only 38?

  • Within the PRNG, 3 iterations of output are withheld for key derivation - because of the backdoor each iteration yields 2 blocks: 6 blocks are never output (so we don't need to know them anyway)
  • The remaining 4 blocks are those which we used in order to test our attack's calculation of the key, so we've got those too, but already knew them.
Basically, we've still recovered all the blocks that a script consuming bytes might have seen.

You may be wondering if enabling prediction resistance would protect the user here. It won't - the random bytes are mixed in before encryption, not after, so we can still backtrack over the values (actually, no - I'm wrong. it will help. Although we'll still be able to decrypt the input, the input is no longer the output of the last round, so we'd need to be able to figure out the bytes it was XOR'd with, essentially needing to try and brute-force 512 bits).

 

Getting Past Key Rotation

This is a far, far more significant challenge to overcome.

In order to perform the back-tracking above, we needed 128 bits of information: 64 bits of output and 64 bits to disclose the two keys.

The backdoor allows for some of this information to cross the boundary - we pass in the previous key information

         # Next plaintext is the last block
-        plaintext=buffer1[-1]
+        plaintext=xor_bytes(buffer1[-1],buffer1[-2])

When we reach the beginning of an iteration sequence, we'll therefore have the keys used for the final half of the previous one.

Unfortunately, this information is useless to us if we don't have some output bytes to decrypt with it. I've got those written to a file as part of my testing, but the whole point is you won't have this when running an attack for real.

Unfortunately, there isn't a simple way to have this information cross the boundary, there's no way to communicate 128bytes of entropy with only 64bytes to do so in.

But, because we have the key, if we _were_ able to find some output from the last iteration sequence, we'd be able to work back through that block too.

Without that stroke of luck though, to get further and jump the rotation boundary, an backdoor author would need to make much more significant changes, a difficult task if you're to avoid it becoming too obvious in the output.

But, even with the constraints we have here, they could gain some additional traction by

  • Massively increasing the re-seed interval (or even removing re-seeds entirely)
  • Increasing the number of iterations that are performed, so that more blocks can be recovered between re-keys - this can be used to help increase the chances of another consumer reading within the same iteration sequence as us

So, this is where I stopped playing with my backdoor...

You can find a copy of the attack script here.

 


Conclusion

Getting back on track a bit, this started as an interesting distraction from my efforts in building a system for generating and distributing (hopefully) truly random bits.

The resulting Psuedo-Random Number Generator incorporates a number of features

  • ChaCha20 based so in theory should be Cryptographically Secure (it isn't)
  • Implements Fast Key Erasure to help minimise an attackers ability to back-track
  • Implements prediction resistance
  • Implements additional back tracking protection

As we've seen, despite all that, because it's written in a high-level garbage collected language, it's not safe for use in sensitive contexts.

It was also reasonably easy to surreptitiously backdoor so that an attacker could back-track and calculate previous output (up to a point).

I've discussed the difficulty of assessing randomness before, but I think it's worth highlighting here that this PRNG passed all the randomness tests despite having a simple but serious bug which caused the encryption key to be 16 bytes of 0x0 50% of the time. Not to mention, of course, that backdoored output continued to pass those same tests.

Playing around with implementing all this, though, has been an interesting diversion.

So to close up, there's only one thing really left to say: please, please,  please do not use my PRNG for anything you care about.