Results 1 to 15 of 15

Thread: Any books on breaking simple codes found in malware?

  1. #1

    Any books on breaking simple codes found in malware?

    I'm interested in simple cryptanalysis techniques that can be used to decode rot, vigenere, xor or whatever else malware might use. I know Perl, but math is not my area at all. I'm just looking for simple material on breaking codes used in malware.

    The below books look interesting, but I'm not sure how relevant they are to current methods malware uses for encoding.

    Codebreaker: The History of Codes and Ciphers

    Between Silk and Cyanide: A Codemaker's War, 1941-1945

  2. #2
    I've not read the books you have link above,I do know a book that talks about cryptovirlogy(it assumes you know cryptography)

    There is another book you might wanna read "The Cryptoclub Using Mathematics to Make and Break Secret Codes" is a book an introduction to codes and ciphers for beginners or for middle schools

    Reverse the code,Reverse Your Minds First

  3. #3
    Wow, those books seem to be just what I was looking for. Thanks esther!

  4. #4
    Teach, Not Flame Kayaker's Avatar
    Join Date
    Oct 2000
    Blog Entries

    Here's a small example of a BlowFish algo as found in a malware. Not too sure how useful it might be, but..


  5. #5
    Thanks, I also found National.Geographic.Code.Breakers online which was pretty interesting if anyone wants to check it out.

  6. #6


    Hi Sunk,

    Just to share some quick wins handling encoded or encrypted malicious data that I have seen thus far.

    Single Byte XOR - use brute force or WinHex histogram binning feature. Normally, most of the bytes in a MZ file are zero. If it is XOR encoded, the zeros will become that XOR key. i.e. key XOR zero equals key. This is especially useful in identifying encoded MZ file to be dumped by a dropper PDF or doc.

    XOR Chaining - need to RE to identify the start of the encoded string. Sometimes it does forward or reverse chaining. E.g. For i = 2 to length, Data [i] ^= Data [i -1]

    B64 - to recognize it, it looks like a non-language long string made of alphabets and numbers (sometimes with = or == appended). It is normally used when transmitting data to CnC. Do take note, the encoding scheme may be customised. You can look out for special strings in MZ used as encoding schemes such as "ABC ... YZabc ...y0123...9+/" or "abc ...yABC ... YZ0123...9+/." see wiki for standard string.

    RC4 - some RE is required. It is very easy to implement, and sbox is not required. Take note of loops that fill an array with 0,1,2,3, ... ,254,255. And, in another loop, it swaps and XOR at modulus 256 ring. See for implementation.

    MD5 - it usually appear in the length of 32 hexascii (16 bytes). If you RE, you will see 3 functions supporting this - init, update and final. See java example.

    Feistal Cipher and AES - examples of Feistal cipher are 3DES, DES, BlowFish... look out for length of input and output. You need to RE for the key. The tell tail signs are the SBOXes unless they customize, which is rare but can be easily done.

    Hope this helps.
    Last edited by eiploader; June 2nd, 2011 at 19:17.
    I promise that I have read the FAQ and tried to use the Search to answer my question.

  7. #7
    @eiploader NICE! Your post helps a lot, thanks!

    Just out of curiosity, how did you learn how to deal with encoded/encrypted malware? Is there a single source you can recommend, or was it knowledge you just picked up here and there over a period of time?

  8. #8


    Hi Sunk,

    Glad to hear that the post is helpful. I used to do quite a fair bit of cyptography and these accumulated experience has probably help in my current work.

    "Malware Analyst's Cookbook and DVD" talks a little bit about obfuscation and encoding found in malware. Other chapters are quite useful too.

    "Applied Cryptography" is what I used during school days and previous job. This book is quite in-depth into algorithm implementation, which is helpful if you are doing RE.
    I promise that I have read the FAQ and tried to use the Search to answer my question.

  9. #9
    Awesome, I'll check out those books, thanks again!

  10. #10
    Sorry to interrupt another thread so late. Personally i would recommend reading some of Eric Filiol's 2010 conference papers. It's probably related to this thread.

    @epiloader: But just a quick question. How do you determine that the S-Box is indeed for Blowfish and not for other Block Cipher provided they are customised?
    As for Block Cipher, it will usually pad it to a block as the name suggested whereas for stream cipher. You give it 12bytes, you get back 12bytes for output.

    Correct me if i am wrong anywhere.
    I love learning.
    This is an indeed interesting topic.

    [ Gunther ]

  11. #11
    Hi Gunther,

    Thanks for posting such a great question.
    1. In the BlowFish example, it is actually using Feistel Cipher, which of course can be easily customized through modifying the F function. In cryptography, encryption primarily involves permutation (rearranging) and substitution (mapping), and one can modify the P or S of known function.

    To help in recognizing the type of encryption used, I targeted in examining the SBOX, the mapping table used. In BlowFish, there are 4 SBOXes with 256 entries (32 bits per entry) in each box. And with that known tables, I can use it to juxtapose against the hardcoded values found in Malware. As mentioned, this is only a tell tail sign. One can customize other part of Round Function as long as function's properties are fulfiled. However, we could be confirmed by 2 following ways:
    ** *A) reverse the entire function and check the implementation of permutation and substitution. But not all reversers are crypto-trained to interpret the crypto functions;
    ** *B) reverse to get the key and a cipher. Try that out using readily available scripts.

    Just an observation, there are 4G (2^32) choices to fit into 4x256 entries. Hence, the likelihood to use the same SBOX is low, unless the block cipher is customized using BlowFish (in this example).

    2. The input and output length that I refers to is the I/O size of function of each block. E.g. AES takes in 2 inputs, key and data block. The key can be 128,192,256,... While the data block is 128 bits. The output is also 128bits. From here, there could be a chance that AES is used.
    DES takes in 8 bytes key (but only 56 bits used) and data block size is 64 bits.
    MD5, SHA1 and SHA2 has output length of 128, 160 and 256 bits respectively.
    With that we can quickly guess the type of cipher algorithm used, but with no 100% guarantee. This is only one way to probe a black box.

    The reason why SBOX values and length of I/O block are used is because it is a quick win to find out the type of algo used in malware.

    Hope this clarifies my rationale.

    Appreciate your question.
    I promise that I have read the FAQ and tried to use the Search to answer my question.

  12. #12
    Sorry to reply so late as i just came back minutes ago. Correct me if i am wrong as i'm not a crypto-trained reverser.
    But Blowfish is not the only one with 8bits input and 32bits output.
    Snefru, Blowfish, CAST, and SQUARE all use 8bits input with 32bits output S-boxes.

    For point 2, maybe it's easier for others to know that An S-box with p input bits
    and q output bits is denoted p → q
    S-boxes are designed for software implementation on 8-bit processors. The block ciphers with 8 → 8 S-boxes are SAFER, SHARK, and AES.
    DES uses eight 6 → 4 S_boxes.

    However, the way a crypto function is implemented is certainly different for every cipher.
    I usually extract out the decryption algorithm and convert it into pseudo code and check against my crypto book to see which matches the closet.

    But this tool is great for some instances though. Not sure whether you have used it before.

    My pleasure to exchange pointers.
    There are better crypto people out there and they are probably laughing at my knowledge. :P

    [ Gunther ]

  13. #13
    Hi Guther,

    Thanks for sharing your thoughts and insights.

    Regarding the statement that you have made earlier, "Snefru, BlowFish, CAST, and SQUARE all use 8bits input with 32bits output S-boxes."

    I agree that it is not uncommon to see different ciphers using 8 bits input with 32 bits output S-Box, however, it is uncommon to see different cipher sharing the same SBox. Therefore, the point that I am driving is that the hard-coded values in the mapping table (SBox values) are unlikely to be the same in value and position. Hence, we could use it for clues.

    Why do I say the chance is low to have exactly the same SBox?
    For example in a case where the a SBox has 256 entries with 32 bits values each, the permutation is 2^32 x (2^32-1) x ... x (2^32 - 255) ways. To have the same permutation, the chance is approximated to be 1 in 2^(32x256).

    For example, the first SBox value in BlowFish is D1310BA6, as compared to the first SBox value in CAST-128 is 30fb40d4.

    Simply "googling" these hard coded values, you can easily find implementation that uses these values.
    See example:

    Guess what is the algorithm used if you see 0x63, 0x7C, 0x77, 0x7B in a SBOX?

    And that's the reason why I say that they are tell tail sign to identifying. At least, it gets you started somewhere.


    Probably, the best way to confirm it to try it against a suspected implementation. Checking against textbook is viable, but not always easy. Some implementations are easy to reverse into pseudo code, but very difficult to map it to crypto book due to the optimization of the implementation.

    My pleasure to exchange thoughts too.
    I promise that I have read the FAQ and tried to use the Search to answer my question.

  14. #14
    Yes, this is the answer i'm waiting for.

    [ Gunther ]

  15. #15
    Thanks for sharing your thoughts Gunther!
    Appreciate that.

    I promise that I have read the FAQ and tried to use the Search to answer my question.

Similar Threads

  1. looking for tuts/books/useful links on RE under linux
    By jcyang in forum The Newbie Forum
    Replies: 5
    Last Post: November 7th, 2009, 10:51
  2. fun remote DLL loading found in malware :)
    By evaluator in forum Malware Analysis and Unpacking Forum
    Replies: 17
    Last Post: January 7th, 2009, 03:42
  3. E-books
    By trainingday in forum The Newbie Forum
    Replies: 5
    Last Post: November 4th, 2005, 20:07
  4. New interesting Kris Kaspersky books...
    By gadget in forum Off Topic
    Replies: 2
    Last Post: February 8th, 2003, 13:21
  5. Get some e-books here
    By Solomon in forum Advanced Reversing and Programming
    Replies: 0
    Last Post: April 8th, 2001, 04:58


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts