A cryptanalysis of a bitstream cipher, in answer to a challenge proposed to the +HCU.

Chaos is definitely not randomness

Written by +Spath.

ABOUT THE BEST CHALLENGE THE BEST CIPHER I) FROM CHAOS TO ORDER II) THE CHOSEN-PLAINTEXT ATTACK III) THE FIRST BYTE IV) ALL FOR ONE V) A REAL-LIFE ATTACKABOUT THE BEST CHALLENGEA few words before we start : as it is described on Fravia+'s site, the BEST challenge is absurd. As pointed out on Fravia+'s board by Ghiribizzo and others, no serious evaluation can be done with only a few plaintext-ciphertext pairs and without any description of the algorithm. With the rules that Jeremy chose, even if nobody breaks his challenge in 100 years, it does not prove a single thing about the strength of his algorithm. On the contrary, a cryptanalysis can much more easily expose weaknesses of the algorithm and describe various possible attacks. Fortunately, all the informations needed for a cryptanalysis were available on Jeremy's homepage, including a description of the algorithm and some C source code.THE BEST CIPHERTo fully understand this paper, I highly recommend to get a copy of the original definition of the BEST cipher on Jeremy's homepage. However, here's a description of the BEST encryption algorithm, where : Ix is the bit number x of the input file (plaintext). Ky is the bit number y of the key file. Ox is the bit number x of the output file (ciphertext). f(n) = 3.56999999*n*(1-n), the chaotic function. maxy is the size of the key file (in bits). ^ is the XOR operator. ~ is the NOT operator (bit inverse). % is the modulus operator. Norm is a normalizing function that keep the decimal part of a number, i.e. between 0 and 1. A0 = Encryption algorithm 1. n = seeder value 2. r = 3.56999999 3. For each x 4. Ox = Ix ^ Ky 5. Ky = Ox 6. If n > 0.5 then Ox = ~Ox 7. If Ix = 0 then n = n + 0.33333333 8. Else n = n + 0.66666666 9. n = Norm(f(f(n))) 10. If Ox = 1 then 11. y = (y +1) % maxy 12. If y % 8 = 0 then 13. { Ky = Ky ^ Ky-8, . . ., Ky+7 = Ky+7 ^ Ky-1 } 14. End of for loop The seeder value is computed by looping through the key file for the target file length. Later, I will use the notation Kx(y) when I refer to bit y of key byte x (x and y lowest values are 0). I will use the same notation for the plaintext bytes (I) and the ciphertext bytes (O). Now let's write the dependency equations of this algorithm : Ox = F(Ix, Ky, n) (A) Ky = F(Ox, Ky-8) (B) n(k+1) = F(Ix, n(k)) (C) seed = F(K, size(I)) (D) I think that equations B, C and D show weaknesses of the algorithm, I will use B and C for my attack.PART ONE : FROM CHAOS TO ORDERLet's make a little experiment : fire Hiew and create a plaintext file filled of '00' bytes ; choose any file on your hard disk as key file and encrypt the plaintext with BestWin. Now look at the ciphertext : it's almost completely filled by '00' bytes. What happened ? If we submit an 'all 0' input file, we can simplify steps 4 and 7 and remove steps 5 and 8 from the algorithm, so that we have : 3. For each x 4. Ox = Ky 6. If n > 0.5 then Ox = ~Ox 7. n = n + 0.33333333 9. n = f(f(n)) 10. If Ox = 1 then 11. y = (y +1) % maxy 12. If y % 8 = 0 then 13. { Ky = Ky ^ Ky-8, . . ., Ky+7 = Ky+7 ^ Ky-1 } 14. End of for loop Now if we study Xn+1 = f(f(Xn)), we see that something very strange happens : whatever initial value you chose, after some time, the normalized chaotic function get stuck and converges to 2 values. Look at these succesive values of n after step 7 ; the left column lists the values for even bits and the right column the values for odd bits : even bits odd bits 0.8923637046805643 0.8377088795512472 0.8923637046808309 0.8377088795507856 0.8923637046808787 0.8377088795507024 0.8923637046808873 0.8377088795506879 0.8923637046808889 0.8377088795506854 0.8923637046808890 0.8377088795506848 0.8923637046808892 0.8377088795506848 0.8923637046808892 0.8377088795506848 ... ... To have a good idea of what happens, you can draw f(f(n)) and g(n)=n. The f(f(n)) curve has a symetry axis at 0.5 (since f(1-n)=f(n)), it has 1 minimum and 2 maxima, one of them being (0.8320, 0.8923), which explains the convergence. We see also that the second extremum is very close from the curve of g(n), and that the distance between the two maxima is very close from 0.33 (modulo 1). So after enough '00', the chaotic function will just give out two values, 0.8923637046808892 for even bits and 0.8377088795506848 for odd bits (these values may be switched for other keys). Both normalized values are greater than 0.5, so we can simplify step 6, and after enough rounds we will have this algorithm : 3. For each x 4+6. Ox = Ky^1 = ~Ky 7. n = n + 0.33333333 9. n = f(f(n)) 10. If Ox = 1 then 11. y = (y +1) % maxy 12. If y % 8 = 0 then 13. { Ky = Ky ^ Ky-8, . . ., Ky+7 = Ky+7 ^ Ky-1 } 14. End of for loop Now suppose we reach a '1' bit of the key, i.e. Ky = 1, we have therefore Ox = 0, the y index is not incremented and since n will remain stuck, y will not change any more : all following bits will be assigned to this ~Ky null bit and the rest of the output file will be all "00". This is particularly striking with a 'all 00' file, but this also happens with shorter patterns. So the first conclusion is : depending on the key, any long enough "00" pattern in the plaintext will create a "00" pattern in the ciphertext (or in other words, we can guess sequences of null bytes of the plaintext).PART TWO : THE CHOSEN-PLAINTEXT ATTACKTo illustrate this chosen-plaintext attack, I will take as example a simple 25 bytes long key ; here's its hex dump : 54 65 73 74 6f 6E 73 20 75 6E 20 6E 6F 75 76 65 6C 6C 65 20 66 6F 69 73 20 The plaintext that will be encrypted is a file defined as follows : In = 01h if n=16+8*k, k being any integer In = 00h elsewhere (in other words, all the plaintext bytes are null, except I16, I24, I32, ... that are equal to 01h). All the "00" bytes are used to unbalance the algorithm in the previously described state. The file also contains a repetitive pattern of 8 bytes containing only one set bit : this bit is used to get next sequence of '1' bits of the key, as you will see later. For this example, I took a 8016 bytes long plaintext ; however, this method should work for any plaintext that is at least 30 times bigger than the key. Whatever key you chose, some patterns appear in the output file, i.e. after enough rounds we can identify a periodic byte sequence. Jeremy tried to prevent patterns from appearing in the output file by modifying the key at each used byte (at step 13). But since the original key is restored after all its bytes have been used, an input file with repetitive patterns and fixed n values will also cause patterns to appear in the output file. Since we know that the key index y is incremented only if the output bit is '1' (at step 10), the conclusion is : the size of the key is equal to the number of '1' bits in a period of the output file. With our example, the smallest pattern is 66 bytes long, starting at offset 21, and it contains 200 bits with the value '1', so we know that the key is 200 bits long (25 bytes).PART THREE : THE FIRST BYTELet's focus now on the first bytes of the ciphertext ("00" bytes have been skipped, the bits that have an underscore on them were already '1' in the plaintext) : byte nb : 0 16 24 value : 07h 05h _ 3Fh _ 0000 0111b 0000 0101b 0011 1111b Before we try to recover some bits of the key, we must be aware of an undocumented "feature" of the algorithm : the initial n value (seed) is calculated from the bytes of the key, but the pointer in the keyfile is not reset after this job. Therefore, the first byte of the plaintext is (most of the time) not encrypted by the first byte of the key. So our first job is to find which byte of the key was used to compute these 3 ciphertext bytes. We know that the seed calculation is done by looping through the key file for the target file length, and at this moment we know both file sizes, so we can find this information. In our testcase, I encrypted a 8016 bytes plaintext and I found out that the key is 25 bytes long so the first key byte used for encryption will be : (16016 % 25) = 16. So the first plaintext byte (I0) is encrypted by K16 (the 17th byte of the key file, since we start from K0). Now let's come back to equation (C) : n(k+1) = F(Ix, n(k)) That means that if I know the input file (this is the case here) and one value of n, then I also know all future values of n. Hey, but we do know some value of n, since after some '00' patterns, n is equal to one of the two values we saw before. So when the first '1' bit of the plaintext is processed (that is I16(0)), n is equal to 0.8923637046 or 0.8377088795. Let's assume for now that the value of n when the first bit of I16 is processed is the second one. The first three bits of K16 are unknown, since they are used to encrypt the first byte of the key, and for this one we don't know anything about n. The fourth key bit is necessarily a '1', since it's the requested value to unbalance the chaotic function in the state where it continuously gives out '0' ; so K16(3) = 1. Now we can recover all other bits of K16, since we can calculate all future values of n ; that's what we're doing now : Let's focus on byte 16, we have : For bit 0 : 4. O16(0) = I16(0) ^ K16(3) = 1 ^ 1 = 0 6. n = 0.8377 => O16(0) = ~O16(0) = 1 8. n = 0.8377 + 0.666666666 9. n = Norm(f(f(n))) = 0.3427 11. y = y +1 For bit 1 : 4. O16(1) = K16(4) = 0 ===> K16(4) = 0 7. n = n + 0.33333333 9. n = Norm(f(f(n))) = 0.6088 (note that this ciphertext bit is always a pure key bit) For bit 2 : 4. O16(2) = K16(4) = 0 6. n = 0.6088 => O16(2) = ~O16(2) = 1 7. n = n + 0.333333 9. n = Norm(f(f(n))) = 0.5590 11. y = y + 1 No other output bit is at '1', so we are sure that the next key bit is '1' : so K16(5) = 1. Now we can focus on byte 24 : for bit 0 and bit 1, it's the same story as for byte 16, and bit 1 is also a pure key bit : K16(6) = O24(1) = 1 For bit 2 : 4. O24(2) = K16(7) 6. n = 0.6088 => O24(2) = ~O24(2) = 1 ===> K16(7) = 0 7. n = n + 0.333333 9. n = Norm(f(f(n))) = 0.5590 Let's summarize what we found : the first key byte used to encrypt the plaintext is K16, and its binary value is 0110_1??? ; therefore, there are only 8 candidates for this 17th key byte (from 68h to 6Fh).PART FOUR : ALL FOR ONEI will now show you how to recover the complete key if you know one of its bytes. As example, I will recover K17 from K16, any other byte of the key can be found from the previous one with the same method. Let's suppose we test 6Ch as candidate for K16 (which is the correct value) ; so we know the plaintext, the corresponding ciphertext, the first key byte and some values of n. The first thing to remember is that each key bit is modified after it has been used at step 5 (Ky = Ox). Therefore, when we're about to use K17, K16 has been modified into another value : this new value is the first information to find. Let's see how this happens : 4. Ox = Ix ^ Ky ; here we know Ix and Ky 5. Ky = Ox ; here the key bit can be modified 6. If n > 0.5 then Ox = ~Ox ; we also know Ox after this step We know Ix and Ky at step 4, so we can calculate Ox at step 4, and therefore Ky at step 5. We also know Ox after step 6, it is the real output value, so we can deduce if n was greater than 0.5 at step 6. Here's a summary of what we have : in the next table, the two first columns and the last one are known, the other ones can be deduced from the 3 previous equations : plaintext K16 temp. output new key value n real output (0h, 1h, 1h) (6Ch) (step 4) (step 5) (step 6) (7h, 5h, 3Fh) --------------------------------------------------------------------------- I0(0)=0 K16(0)=0 => O0(0)=0 => K16(0)=0 n>0.5 => O0(0)=1 I0(1)=0 K16(1)=0 => O0(1)=0 => K16(1)=0 n>0.5 => O0(1)=1 I0(2)=0 K16(2)=1 => O0(2)=1 => K16(2)=1 n<0.5 => O0(2)=1 I0(3)=0 K16(3)=1 => O0(3)=1 => K16(3)=1 n>0.5 => O0(3)=0 I16(0)=1 K16(3)=1 => O16(0)=0 => K16(3)=0 n>0.5 => O16(0)=1 I16(1)=0 K16(4)=0 => O16(1)=0 => K16(4)=0 n<0.5 => O16(1)=0 I16(2)=0 K16(4)=0 => O16(2)=0 => K16(4)=0 n>0.5 => O16(2)=1 I16(3)=0 K16(5)=1 => O16(3)=1 => K16(5)=1 n>0.5 => O16(3)=0 I24(0)=1 K16(5)=1 => O24(0)=0 => K16(5)=0 n>0.5 => O24(0)=1 I24(1)=0 K16(6)=1 => O24(1)=1 => K16(6)=1 n<0.5 => O24(1)=1 I24(2)=0 K16(7)=0 => O24(2)=0 => K16(7)=0 n>0.5 => O24(2)=1 Before we used K16 for our encryption, its value was 6Ch. From what we obtain in the fourth column, after we used K16, its new value is 44h. Therefore, before being used, K17 will be XORed with 44h at step 13. So we just have to retrieve K17 by the same method we used in part III, and at the end XOR its value with 44h to obtain the real K17 value. Let's do it : byte nb : 24 32 40 value : 3Fh _ 05h _ 0Dh _ 0011 1111b 0000 0101b 0000 1101b So for byte 24, we have : bit 3: 4. O24(3) = K17(0) 6. n = 0.5590 => O24(3) = ~O24(3) = 1 ===> K17(0) = 0 7. n = n + 0.333333 9. n = Norm(f(f(n))) = 0.8043 11. y = y + 1 bit 4: 4. O24(4) = K17(1) 6. n = 0.8043 => O24(4) = ~O24(4) = 1 ===> K17(1) = 0 7. n = n + 0.333333 9. n = Norm(f(f(n))) = 0.8718 11. y = y + 1 bit 5: 4. O24(5) = K17(2) 6. n = 0.8718 => O24(5) = ~O24(5) = 1 ===> K17(2) = 0 7. n = n + 0.333333 9. n = Norm(f(f(n))) = 0.8684 11. y = y + 1 ... and so on until we find K17(7). At last, we obtain K17 = 28h, so that the real K17 value is 28h ^ 44h = 6Ch. Now we could find K18 from K17 with the same method, and then, one by one, all other bytes of the key. Since we had only 8 candidates for K16 and 2 candidates for the initial n value, we have only 16 possible keys. The correct key among these 16 candidates will appear by itself, because wrong keys will produce inconsistencies as we will decrypt more and more known plaintext. So this attack requires at worst 2*2^8= 512 attempts to retrieve the complete key, which takes a few minutes on a PC.PART FIVE : A REAL LIFE ATTACKYou may think that the previous attack is just theoretical, since in real life I will never be able to choose what will be encrypted. Well, you're quite right, at least for a correct software implementation. But the same weaknesses can also be exploited for a ciphertext-only attack, as I will demonstrate now. Before we start, let's think about what we used for the chosen-plaintext attack : the known plaintext-ciphertext pair, the key length and one known value of n. But the key length is not that necessary, and we could have just assumed that the key was the same size as the plaintext : then, some repetitive patterns would have appeared in this 'unfolded' key, and a period of these patterns would be the real key. Also, the plaintext could have been anything, as long as we knew one value of n. So what we actually need for this attack is a plaintext-ciphertext pair and a known n value. Now let's say we found a encrypted .BST file somewhere and we want to decrypt it. First, we can look for '00' patterns, because we know that these bytes are also in the plaintext. Some file formats contains lots of '00' bytes at defined places, and are therefore easy to identify (.EXE, .TAR, .BMP, etc). From there, an attack can be prepared. As an example, let's take a encrypted PE executable file. I have easily identified this format by looking at the '00' patterns in the file, in the header and also some big spaces that must be used as internal buffers. Here's the encrypted header of the file : 0000: 07 72 5d 00 97 93 01 00 34 00 00 00 e7 67 0d 00 .r].....4....g.. 0010: d8 1a 00 00 00 00 00 00 40 05 00 00 00 00 00 00 ........@....... 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0030: 00 00 00 00 00 00 00 00 00 00 00 00 80 0e 00 00 ................ 0040: 1a 8d 93 0b 00 a4 60 fb 6f d8 0f 04 d9 a3 86 cf ......`.o....... 0050: 1f 60 67 50 45 ce 25 f0 2e 47 aa ed 15 03 2c 87 .`gPE.%..G....,. 0060: 4e e0 23 df a7 50 31 f9 61 d4 6a a1 ea f6 11 6b N.#..P1.a.j....k 0070: 66 cd 74 24 6c 42 69 02 d4 00 00 00 00 00 00 00 f.t$lBi......... 0080: 70 43 07 00 3c 6d 3a 00 57 2a b2 ae 3a 93 07 00 pC...m:.W*..:... If we look at these bytes from a 'PE header' point of view, we can easily guess what these bytes actually are : - bytes 00h-1Fh : the MZ header. - bytes 20h-3Fh : a sequence of '00' bytes, between the header itself and the MZ entry point. - bytes 40h-78h : the MZ stub (the small piece of code that displays the string and quit) and the string itself ("This program needs Win32 blabla"). - bytes 80h- : the real PE header Now if we look at these bytes from a 'BEST cryptanalysis' point of view, we have a sequence of '00' long enough to give us a known value for n, so if we knew the bytes starting at 40h, then we could start a known-plaintext attack. Hey, but all MZ code and data bytes (from 40h to 7Fh) are just prepended to the assembled code by the linker during compilation. That means that if we know which compiler has been used, then we have 64 bytes of known plaintext, and therefore we can recover 64 bytes of the key. Actually, we also have only 3 possibilities for the 6 first bytes of the PE header, so that we have 70 known bytes. So in a few tens attempts, we can retrieve 70 bytes of the key. From here we can continue in two directions, finding more bytes of the plaintext and finding more bytes of the key. If the key is shorter than 70 bytes, a pattern will appear in the 'unfolded' key we found ; in this case, our job is done. If the key is longer, we first could try to find its actual length. To test a candidate for the key length, we calculate which pieces of the ciphertext will be decrypted by these 70 bytes if the key length we try is the correct one. Then, we calculate the according n values and decrypt these pieces of ciphertext ; we obtain one or more pieces of 70 bytes long plaintext : if all these bytes does not contain any recognizable data resources (strings, lookup tables, ...) or some code, then we can assume that the key length we tested was not correct. Unless the program is packed or crypted, this method can give us some possible key length and some possible pieces of plaintext. The same kind of investigations can be performed on various file formats, and this way some informations can be recovered just from the ciphertexts.

Let's summarize our results : with a single text chosen-plaintext attack, we can recover the complete key (as a comparaison, the same attack against DES requires 2^47 plaintexts). In the worst case, this attack requires 512 attempts and can be completed in a few minutes on a PC. Moreover, the same weaknesses can be used for ciphertext-only attacks, not only to identify the format of the original file, but also to retrieve pieces of the key. I conclude that the BEST encryption algorithm is not secure and should not be used. The whole strength of the algorithm was based on the assumption that a chaotic function cannot be predicted ; when it proved to be false, the full algorithm collapsed. Yet, the risk of chaotic functions and the difference between chaos and randomness had already been very clearly explained by Scut in an old essay (Fravia.org/crymaco.htm). Nowadays, many cryptographers are suspicious of chaos, and prefer not to use it when designing a new cipher. Maybe Chaos is just too difficult to control to be a reliable tool ? +Spath. (spath at iname dot com) -- Interested in cryptanalysis ? read Casimir's essays on the great homepage of Joe Peschel, at http://members.aol.com/jpeschel/index.htm

reality cracking how to search javascript wars

tools anonymity academy cocktails antismut CGI-scripts mail_Fravia

Is reverse engineering legal?