 PDA

View Full Version : super-tiny PRNG

mike
May 6th, 2002, 01:31
Adi Shamir proposed this three-op PRNG:

x+=(x*x) | 5;

x is a k-bit integer, so there's an implicit mod 2**k. For example, if k==16, then in C you'd have

unsigned short x;

or if k==64, you'd have to use

unsigned __int64 x;

Anyway, the high-order bit is all you get from each iteration. Assuming x is too big to brute-force, how do you find the current seed? A three-op PRNG has to have some kind of weakness...

cyberheg
May 6th, 2002, 18:24
Without beeing an expert like many others here I'd like to ask a few questions to get this down to human level Why OR with 5? I can understand that this would give noise on the lowest bits to make it harder to find the initial seed but why OR instead of like XOR and why 5 instead of another small number like 7?

I remember a some time ago program using a multiplication like above:

result (128 bit) = var1 (64 bit) * var2 (64 bit).
The upper 64 bit of the result would be cut off so the end result would also be 64 bits. The task was then to find a var2 where var1 could not be changed and to make the valid result (which should be 0x0000000000000001 (the upper 64 bits are cut off and not visible).
The method to solve it was to do trial multiplications (not simple bruteforcing) to find which bits were best to set for a given result.

Comparing these two things makes me think it's similar (please correct me if I am wrong). Var1 and Var2 is the same in the PRNG case and assuming we start with x beeing 0 (at the start) then it should also be possible to find the number which would give this, no? Ofcourse the OR would give some noise which would make more possible cases since OR will make us miss those original bits which were originally there (just by the multiplication).
5 in binary is 101 so only very few bits are damaged which shouldn't make too much extra work to test for each of those cases where the bits are set/not set.

I also assume this works best with a ODD initial seed so the lowest bits are also set.

mike
May 7th, 2002, 16:40
When you OR with 5 (or any other number congruent to 5 mod 8) you get a single long cycle. If you OR with an odd number, you get a permutation, but not necessarily one cycle.

Try splitting it up into cases according to what the "noisy" bits are, i.e.

x=8y+c

and look at what happens to c and y when it gets updated.

ancev
May 17th, 2002, 01:17
hi,

i converted the c random generator to assembler.

Code:
```
;x+=(x*x) | 5;
rnd:
mov eax,12345678h
seed equ \$-4
imul eax
or al, 5
ret

```

when i use XOR instead of OR, i get a output sequence that 'looks' more random.

How i can test this routine for patterns and like? There's a standard for testing random generators?

ancev

tE!
May 24th, 2002, 09:10
There's no standard for testing prng's, but there're a lot of different algorithms/ways to test randomness of the output of your prng. One application that makes use of some of them(Entropy, Optimum compression test, Chi square distribution, Arithmetic mean value, Monte Carlo value, and Serial correlation) is 'ENT' - available here: http://www.fourmilab.ch/random/

I used it in the past to 'test' 10MB sequences of pseudo random data - the output of some of my selfmade generators. I'd say that such statistical tests, as used in ENT, don't say everything about a sequence's randomness, but I guess they're a good point to start with.

cyberheg
May 31st, 2002, 07:04
I now implemented this as a 32 bit RNG and ran a test on it on a 20 mb file:

#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <time.h>

int main()
{
FILE *fp;
int i;
DWORD dwRandom;

if ((fp = fopen("random.txt", "wb" ) == NULL)
{
printf("File could not be opened\n" ;
return 1;
}

dwRandom = 0x87654321;

for (i = 0; i < 0x500000; i++)
{
dwRandom += (dwRandom * dwRandom) | 5;
fwrite(&dwRandom, 1, sizeof(DWORD), fp);
}

fclose(fp);

return 0;
}

This gave the following results:

Entropy = 7.999997 bits per byte.

Optimum compression would reduce the size
of this 20971520 byte file by 0 percent.

Chi square distribution for 20971520 samples is 99.31, and randomly
would exceed this value 99.99 percent of the times.

Arithmetic mean value of data bytes is 127.4840 (127.5 = random).
Monte Carlo value for Pi is 3.140386118 (error 0.04 percent).
Serial correlation coefficient is 0.000080 (totally uncorrelated = 0.0).

The only test which really stands out is the Chi square test.

Additionally I took a 10 mb file from www.random.org and ran it through the same program which gave the following results:

Entropy = 7.999983 bits per byte.

Optimum compression would reduce the size
of this 10485760 byte file by 0 percent.

Chi square distribution for 10485760 samples is 239.88, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.4760 (127.5 = random).
Monte Carlo value for Pi is 3.141733987 (error 0.00 percent).
Serial correlation coefficient is -0.000196 (totally uncorrelated = 0.0).

Here you see the Chi square test give totally different numbers. Now the question is whats good and whats not. From what I understood reading the discription less then 1% or higher then 99% would be nonrandom data. They give a example of from a radio active source which gives the value 75.00% but I can't really determine which is better between the random file from random.org and this radio active source.

I concluded that the 3 op PRNG is much better then rand() in VC++ which I also ran through the same test. Rand16 from flexlm 7.1x is only half as good as the rand() and unix rand() is better then any of those two but still weaker then the 3 op PRNG.

Maybe someone can comment on this Chi square test.

mike
May 31st, 2002, 15:32
You got it almost right. Don't write out the whole word, or it's trivial to predict the next one. (x'=(x*x)|5, of course.) In fact, it's easy to break if you output any of the bits other than just the top one. Only write out the top bit, i.e. instead of

for (i = 0; i < 0x500000; i++)
{
dwRandom += (dwRandom * dwRandom) | 5;
fwrite(&dwRandom, 1, sizeof(DWORD), fp);
}

do this

unsigned char j,k;

for (i = 0,j=0; i < 0x500000; i++,j=(j+1)&7)
{
dwRandom += (dwRandom * dwRandom) | 5;
k=(k<<1)|(dwRandom>>31);
if (j==7) fwrite(k, 1, sizeof(char), fp);
}

I'm not sure why your randomness test didn't detect this, but the low bits repeat every 2^n iterations.

cyberheg
June 1st, 2002, 09:11
Now I realize I didn't read your initial post fully which is also why the first post I made doesn't make sense if one understands that it's only the highest order bit which should be saved.

Anyway I fixed the code and come with new test results:

#include <stdio.h>
#include <windows.h>

int main()
{
FILE *fp;
int i;
unsigned char j, k;
DWORD dwRandom;

if ((fp = fopen("random.txt", "wb" ) == NULL)
{
printf("File could not be opened\n" ;
return 1;
}

dwRandom = 0x87654321;

k = 0;
for (i = 0, j = 0; i < 0x5000000; i++, j = (j+1) & 7)
{
dwRandom += (dwRandom * dwRandom) | 5;
k = (k << 1) | (unsigned char) (dwRandom >> 31);
if (j == 7)
fwrite(&k, 1, sizeof(unsigned char), fp);
}

fclose(fp);

return 0;
}

The new results are:

Entropy = 7.999981 bits per byte.

Optimum compression would reduce the size
of this 10485760 byte file by 0 percent.

Chi square distribution for 10485760 samples is 280.64, and randomly
would exceed this value 25.00 percent of the times.

Arithmetic mean value of data bytes is 127.4376 (127.5 = random).
Monte Carlo value for Pi is 3.143244607 (error 0.05 percent).
Serial correlation coefficient is 0.000299 (totally uncorrelated = 0.0).

Now it seems the Monte Carlo Pi test is a bit worse while the Chi square test is much better. I still wonder those which value is best for the Chi square test. Now I've seen 3 "possible good value" 25%, 50% and 75%.

Thanks for giving me the tip for fixing the code Mike.