Understanding the Difficulty of Assessing True Randomness

I've had to explain, more than a few times, quite why it's so hard to assess whether a Random Number Generator (RNG) is compromised unless you have access to how the specific implementation works. Just because the data appears to be random, does not necessarily mean that it is actually unpredictable.

In this short piece of documentation, I'll be attempting to demonstrate exactly how a compromised RNG can appear to be generating random data, based on the tests that are available to us.

 

To best demonstrate this, it seems best to work backwards (start with the test and then show how the 'random' data isn't as compromised as first thought). If you want to follow along, you can grab the dataset here. It's a pretty small sample but should be sufficient to demonstrate the issue

Let's start by testing our dataset 

cat randomdata.txt | rngtest

Which should give us

rngtest 2-unofficial-mt.14
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rngtest: starting FIPS tests...
rngtest: entropy source exhausted!
rngtest: bits received from input: 2311168
rngtest: FIPS 140-2 successes: 115
rngtest: FIPS 140-2 failures: 0
rngtest: FIPS 140-2(2001-10-10) Monobit: 0
rngtest: FIPS 140-2(2001-10-10) Poker: 0
rngtest: FIPS 140-2(2001-10-10) Runs: 0
rngtest: FIPS 140-2(2001-10-10) Long run: 0
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=19.364; avg=263.319; max=1467.191)Mibits/s
rngtest: FIPS tests speed: (min=4.659; avg=6.107; max=6.239)Mibits/s
rngtest: Program run time: 377113 microseconds

A bigger dataset generated in the same manner, yields a few failures, but nowhere near enough to be of any concern

Let's ask ent what it thinks

ent randomdata.txt

Which gives us

Entropy = 7.999943 bits per byte.
Optimum compression would reduce the size of this 3388896 byte file by 0 percent.

Chi square distribution for 3388896 samples is 268.84, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.5107 (127.5 = random).
Monte Carlo value for Pi is 3.140371378 (error 0.04 percent).
Serial correlation coefficient is 0.000636 (totally uncorrelated = 0.0).

We've got almost 8 bits of entropy per byte, so ent believes the data is essentially random

Perfect, so surely it should be safe to generate Crypto keys with? Wrong... The data is absolutely and utterly predictable, as random as it might appear to be.

 

Easily Fooled

Although the dataset has performed well in the tests, the method of generation/compromise used was actually incredibly simplistic.

The dataset we've been testing above was generated with the following command

for i in {1..50000};do echo $i; done | openssl enc -aes-256-cbc -k "IKnowThePassword" -nosalt -e > randomdata.txt

If you want, you can verify this by running the following command

openssl enc -aes-256-cbc -k "IKnowThePassword" -nosalt -d -in randomdata.txt

You should see one number per line, incrementing from 1 to 50,000. All we've done to fool the RNG tests is take a predictable source and encrypt it with a known symmetric key so that it appears to be random. As the attacker, I know exactly how to reproduce your 'random' stream of bytes and so could use that to calculate your encryption keys.

 

Conclusion

This was an incredibly simple example of how you can generate seemingly random data, which will happily pass many of the tests available, whilst remaining utterly predictable.

Where our over simplified example would have failed, of course, was when you compared the output on two completely different machines and found them to be giving identical streams. Even within the limitations of the examples above, though, even that it quite easily worked around.

The sort of people who are likely to be compromising RNG's are also highly unlikely to be using such a basic technique. Hopefully, though, the example above serves to explain why you cannot put your sole trust into a RNG unless you know exactly how it works.

 

Side Note: For those wondering about the effect of a compromised RNG on a Linux box - the RNG sources are XOR'd by the kernel, so you only need one of your sources to be unpredictable to be safe.