Results 1 to 14 of 14

Thread: Entyzer v0.1 [Advanced Entropy Analyzer]

  1. #1

    Post Entyzer v0.1 [Advanced Entropy Analyzer]

    [Entyzer+ v0.1 Alpha Build:080410]
    Mohammed Fadel Mokbel
    http://www.themutable.com

    Description: Entyzer+ is an Advanced Entropy Analyzer.
    - Calculates the Entropy, Redundancy, A. Mean and StdDev for any file.
    - Calculates the Entropy and Redundancy for a specific range.
    - Generates an HTML Graph Visualization.
    - Calculates the Entropy, Redundancy and StdDev. for each section of an elf binary file.

    - Description: Entropy Analyzer

    + Syntax: Entyzer -f <filename>

    - To get the Entropy, Redundancy, A. Mean and StdDev. for any file.

    + Syntax: Entyzer -f <filename> -range <start address> <end address>

    - To get the Entropy and Redundancy for a specific range.

    + Syntax: Entyzer -f <filename> -graph <IsValue> <Color Template>

    - To generate an HTML graphical visualization of the supplied file.

    - IsValue takes either 0 or 1. 1 for having the frequency of each character displayed, 0 otherwise.

    - Color Template takes a value between 1 and 7 for different templates:

    1:= Gray I, 2:= Gray II, 3:= Tan, 4:= Olive Green, 5:= Blue,

    6:= Green + Green + Yellow, 7:= Orange + Orange + Yellow

    + Syntax: Entyzer -elf <filename>

    - To get the Entropy, Redundancy and StdDev. for each section of an elf binary file.

    Stay tuned for the paper.

    All your suggestions, comments and feedback are welcome.
    Attached Files Attached Files
    Last edited by tHE mUTABLE; April 11th, 2010 at 21:09.
    The only reason for time is so that everything doesn't happen at once. [A. Einstein]

  2. #2
    Administrator dELTA's Avatar
    Join Date
    Oct 2000
    Location
    Ring -1
    Posts
    4,204
    Blog Entries
    5
    Thanks (and sorry for late reply).

    CRCETL:
    http://www.woodmann.com/collaborative/tools/index.php/Entyzer
    "Give a man a quote from the FAQ, and he'll ignore it. Print the FAQ, shove it up his ass, kick him in the balls, DDoS his ass and kick/ban him, and the point usually gets through eventually."

  3. #3

    Post Entyzer+ Goes Fermions Build v0.2 + Paper Release

    Entyzer+ - Revision History
    ===========================

    Note {Non Functional Changes}
    ----

    [?] A new build is released on December 26, 2010 with static linking only.


    Version 0.2 {Fermions Build:221210}
    -----------

    [?] Released on (December 22, 2010).
    [?] Major update with lots of new fine-grained options.
    [+] Added PE file format parsing for reporting the Entropy and
    other statistical informations.
    [+] Added block selection option wherever applicable.
    [+] Added XML report generation for reporting general and Entropy
    information, percentage and frequency of every hex value.
    [+] Added fine-grained options (-select) for parsing ELF binaries.
    [+] Added "Symbiotic Differential Comparison Algorithm" (-SDCAlg)
    [+] Added Kullback-Leibler Divergence (KLD) measure. The impleme-
    ntation also reports the Resistor Average (RA) distance which
    symmetrizes KLD.
    [+] Added various mathematical hex transformations options (-h:hex).
    [+] Added simple Encryption/Decryption module.
    [+] Added the capability to generate an unsigned C/C++ hex char
    byte array.
    [+] Added icon to the executable file.


    Version 0.1 {Alpha Build:080410}
    -----------

    [?] First Public Release (April 08, 2010).
    [?] Supported reporting the entropy for every section of an ELF
    binary file with some other statistical analysis.
    HTML Graph output using matrix representation.


    General Info.
    -------------

    - This tool is part of "An Unobtrusive Entropy Based Compiler Optimization
    Comparator" paper.
    - Please refer to the paper for more information about Entropy, Matrix Graphical
    Representation, Symbiotic Differential Comparison Algorithm, Kullback-Leibler
    Divergence measure and the Average-Resistor.

    ----------------------------------------------------------------------------------

    _____________________________________________________________________

    [Entyzer+ v0.2 - Fermions Build:221210]
    [Advanced Entropy Analyzer]
    <All Rights Reserved (C) 2010>
    _____________________________________________________________________

    - Description: Entropy Analyzer+ with Hex Editing Capabilities (-h:hex)

    + Syntax: Entyzer -f <filename> [ -b <start_offset> <size> ]

    - To get the Entropy, Redundancy, A. Mean and StdDev. for any file
    or for a specific block.

    + Syntax: Entyzer -f <filename> -graph <IsValue> <Color Template>

    - To generate an HTML graphical visualization of the supplied file.

    - IsValue takes either 0 or 1. 1 for having the frequency of each
    character displayed, 0 otherwise.

    - Color Template takes a value between 1 and 7 for different templates:
    1:= Gray I, 2:= Gray II, 3:= Tan, 4:= Olive Green, 5:= Blue,
    6:= Green + Green + Yellow, 7:= Orange + Orange + Yellow

    + Syntax: Entyzer -f <filename> -xml

    - To generate an XML report: general and Entropy information, percentage
    and frequency of every hex value.

    + Syntax: Entyzer -pe <filename>

    - To get the Entropy, Redundancy and StdDev. for every section of a
    PE binary file.

    + Syntax: Entyzer -elf -section -<option> <filename>

    - <option> = list, To list all the sections names of an elf binary file.
    <option> = all, To get the Entropy, Redundancy and StdDev.
    for every section of an elf binary file.

    <option> = select, Option select is followed by a <section_name>
    To get the Entropy, Redundancy and StdDev. for a selected
    section of an elf binary file. (e.g. section_name = .text)

    + Syntax: Entyzer -elf -SDCAlg <filename * 5>

    - To apply the Symbiotic Differential Comparison Algorithm on a reference
    elf binary file and 4 files compiled at varying levels of optimizations
    (in increasing order). Only the .text section is considered.

    + Syntax: Entyzer -elf -section -select <section_name> -KLD <filename * 2>

    - To apply Kullback-Leibler Divergence (KLD) measure on two elf files
    for a selected section. The implementation also reports the Resistor
    Average (RA) distance which symmetrizes KLD.

    + Syntax: Entyzer -f -KLD <filename * 2>

    - To apply KLD and RA on any file.

    [?] To list the hex transformation options, use the sub-option -h:hex

    + Syntax: Entyzer -f <filename> -hext: <operation> <operand>

    [ -b <start_offset> <end_offset> ]

    - To apply various mathematical hex transformations (operations) on
    a specific file. All the operations work at the byte level. If the
    block (-b) option is specified, the transformation operates only on
    the range specified by the SO and EO, otherwise the whole file is
    taken. <operand> accepts a decimal value between 0 and 255.

    - The <operation> can take any of the following transformations:

    + {mod, neg, div, mult, sub, add} (neg takes no operand)

    + Binary operations: {xor, or, and, inv} (inv takes no operand)

    + {sleft, sright, rotl, rotr} => Shift/Rotate Left/Right

    # ex. [... -hext: xor 4 -b 10 20]

    + {rand} (Randomize takes two operand values: Min and Max)

    + {t1e} (The (t1e) encryption/decryption template module)

    # Takes 3 operand values: 'x', 'y' and 'z'
    # t1e := {add x, xor y, sub z} - t1d := {add z, xor y, sub x}
    # ex. To encrypt: [... -hext: t1e x y z]
    # To decrypt: [... -hext: t1e z y x]

    + Syntax: Entyzer -f <filename> -cpp [ -b <start_offset> <end_offset> ]

    - To generate an unsigned C/C++ hex char byte array.


    [----------------------------------------------------]

    + Entyzer.exe Signature:

    - 32-Bit: MD5 DBF13E1D00D396DD4A8A2A27C28191CE
    - 64-Bit: MD5 8170A5D78173993EC00ED33ADDB33BE4

    + Libraries used:

    - ELFIO library by Serge Lamikhov
    - MD5 Library by Benjamin Grüdelbach

    [----------------------------------------------------]


    Entyzer+_v0.2_Fermions_Build_221210.rar

    The paper "An Unobtrusive Entropy Based Compiler Optimization Comparator" is available at:
    http://themutable.com/Pubs/Mokbel_CASCON_10_V0.5.pdf
    Last edited by tHE mUTABLE; December 26th, 2010 at 12:10. Reason: New build of Entyzer with static linking
    The only reason for time is so that everything doesn't happen at once. [A. Einstein]

  4. #4
    I updated the entry concerning your tool on the CRCETL.

    http://www.woodmann.com/collaborative/tools/index.php/Entyzer

    Cheers tHE mUTABLE
    Please consider donating to help Woodmann.com staying online (here is why).
    Any amount greatly appreciated. Thank you.

  5. #5
    I tested your tool on a Win7 64bits, fully patched, it's spitting an error.

    MSVCP100.dll is missing. I guess it's the Visual C compiler redistributable package =/
    Please consider donating to help Woodmann.com staying online (here is why).
    Any amount greatly appreciated. Thank you.

  6. #6
    Quote Originally Posted by Silkut View Post
    I tested your tool on a Win7 64bits, fully patched, it's spitting an error.

    MSVCP100.dll is missing. I guess it's the Visual C compiler redistributable package =/
    Thanks Silkut! Could you please try to place MSVCP100.dll in the same folder where Entyzer is and see if the error disappears!
    The only reason for time is so that everything doesn't happen at once. [A. Einstein]

  7. #7
    Hello Mohammed,

    It's good to compile with /MT option, I think increase in size will be trivial

  8. #8
    Quote Originally Posted by GamingMasteR View Post
    Hello Mohammed,

    It's good to compile with /MT option, I think increase in size will be trivial
    Thanks! A new build with static linking is up (CRCETL version has been updated as well)!.
    The only reason for time is so that everything doesn't happen at once. [A. Einstein]

  9. #9
    Yay,

    Sorry for not trying with the DLL, family stuff you know..
    Works great now, thanks for the fixup
    Please consider donating to help Woodmann.com staying online (here is why).
    Any amount greatly appreciated. Thank you.

  10. #10
    First of all, congratulations! It is a very nice tool and the description of the tool speaks for itself

    Took a quick look at the paper that is being linked to in the description of the tool. Curious to know what type of statistical model you apply and whether or not you make inference based on that model (Kullback-Leibler is sometimes used for this). However, it appears that with your blackbox approach it is very convenient (and practical) to treat the code section as a pile of bytes with the result that the statistical model is very simple. Considering the fact that code contains a lot of structure it is interesting how much you can infer about the binary based on the simple model you have chosen. As such your method/work could just as well be used to examine other data such as images, audio or/and video that undergo severe compression.


    Anyway, I have a few questions.

    - if a compiler performs optimizations (speed-wise) by unrolling, would you expect the entropy of the underlying random variable to increase, decrease or stay the same?

    The reason I'm asking is that, as far as I understand, you assume that compiler optimizations lead to increased entropy. Sorry, just found that on page 2 end of paragraph 2 you write "..., a loop unrolling will yield bigger entropy..". Intuitively, I would have thought it would decrease. Can you give an explanation of this statement?

    In the description of table 2 it is written "We notice how the entropy value is increasing as the level of optimization increases. That's not only due to the increase in file size but....". You seem to assume that an increase in file size imply higher entropy. From a theoretical point of view ( yawn ) I would say that they have nothing to do with each other but what you are seeing is maybe caused by the 'small sample-size effect'. If you consider a Gaussian random variable, its entropy stays the same no matter how many samples you draw from it. However, if you try to estimate its entropy you will have fluctuating results (high variance) in your estimates when your sample size is small.
    The same comment goes for the statements "...less instructions and the entropy value decreased to reflect this alteration..." and "...hence the decrease in the entropy value.." in the paragraph below table 3.

    Does your tool in any way take into consideration the size of the code section?

    Hope to try out the tool soon

  11. #11
    @niaren. Thank you for your interest and your insightful comments!

    As such your method/work could just as well be used to examine other data such as images, audio or/and video that undergo severe compression.
    Yes, it can be used to examine different data types as well because of the high level of abstraction the model embodies.

    if a compiler performs optimizations (speed-wise) by unrolling, would you expect the entropy of the underlying random variable to increase, decrease or stay the same?
    Perhaps I should have elaborated on this specific statement more (when I update the paper!). Actually, this case is more complicated than what it seems. For a perfect homogeneous unrolling the Entropy would stay the same. However, it is not always the case, since the level of noise in the actual transformation exhibits different code fluctuations from optimization level to another, hence, in our case, the Entropy increases (non-homogeneous). On the other hand, it is also possible for the Entropy to decrease in cases where we have an almost perfect homogeneous distribution (with very few repetitive anomalies) considering that the 'reference' is the perfect homogeneous distribution.

    Please make sure to go over the paper completely since that might answer all/some of your questions! I have mentioned that "The entropy value is bounded to the actual optimization properties", and in many other places in the paper I've emphasized this point.

    In the description of table 2 it is written "We notice how the entropy value is increasing as the level of optimization increases. That's not only due to the increase in file size but....". You seem to assume that an increase in file size imply higher entropy. From a theoretical point of view ( yawn ) I would say that they have nothing to do with each other but what you are seeing is maybe caused by the 'small sample-size effect'. If you consider a Gaussian random variable, its entropy stays the same no matter how many samples you draw from it. However, if you try to estimate its entropy you will have fluctuating results (high variance) in your estimates when your sample size is small.
    Sure, the file size plays a crucial role in the entropy, it is an inherit mathematical characteristic of the equation (depends on the distribution). The correlation is obvious since the size taken is the denominator. I'm not generalizing this observation, since it is different for every distribution.

    The size of the binaries of the benchmarks could reach up to 11MB, so definitely we're not talking about "small-size". Note that the probability distribution is discrete while in the case of Gaussian it is continuous!

    The same comment goes for the statements "...less instructions and the entropy value decreased to reflect this alteration..." and "...hence the decrease in the entropy value.." in the paragraph below table 3.
    It is true by definition (depends on the homogeneity of the distribution).

    Does your tool in any way take into consideration the size of the code section?
    Well, the analysis is solely based on the code section only. As for the tool, you can choose whatever section you want to get the Entropy (via '-select' sub-option) or get the Entropy for the file as a whole.

    I hope that answers your questions.
    Last edited by tHE mUTABLE; December 29th, 2010 at 18:28.
    The only reason for time is so that everything doesn't happen at once. [A. Einstein]

  12. #12
    Thanks for the explanation
    I'm not sure I understand it or understand every detail of it. Confused on a higher-level I would say

    Quote Originally Posted by tHE mUTABLE View Post
    However, it is not always the case, since the level of noise in the actual transformation exhibits different code fluctuations from optimization level to another, hence, in our case, the Entropy increases (non-homogeneous). On the other hand, it is also possible for the Entropy to decrease in cases where we have an almost perfect homogeneous distribution (with very few repetitive anomalies) considering that the 'reference' is the perfect homogeneous distribution.
    I don't understand your coupling between homogeneous/non-homogeneous distribution and the increase/decrease in entropy. Maybe it is because I don't understand or have heard of a homogeneous distribution before. Looked here
    http://en.wikipedia.org/wiki/Homogeneous_distribution
    but I still have a hard time seeing the relevance to your results.
    Is a Gaussian distribution homogeneous?
    Is a mixture distribution of two Gaussians homogeneous?
    but most importantly what does it mean for the output of your tool?


    However, thinking twice about what you're actually doing I now see that it makes sense, at least to me, to think of your tool as being a classifier. Your 'classifier' is a little special in that the feature extraction stage reduces the (high-dimensional) input only one scalar and as such the actual classification stage is reduced to thresholding of that scalar.
    Out of curiosity I made a small experiment. I took an exe file 7z.exe (http://www.7-zip.org/) and made a UPX packed version as well. Then the contents of those two files, from offset 0 to eof, were inpreted as a time series, x(t), where each sample is 32 bit and considered as a Q31 fixedpoint decimal number. Then scatter plots were made, they are plotting x(t) vs x(t-1). In the image below, the left plot shows the scatter plot for the raw 7z.exe and the right plot shows the scatter plot for the UPX packed version.





    As another experiment the below plot shows the autocorrelation function for lags 1 to 1000. Blue line is for raw exe and red line is for packed version.



    Have you tried other 'features' than the entropy one?

  13. #13
    In the context of the paper, the word "Homogeneous" as defined mathematically (from the wiki link you provided) has nothing to do with what I was referring to. I simply meant the uniformity in the structure of the distribution ("composed of similar or identical parts or elements": as defined in the dictionary).

    However, thinking twice about what you're actually doing I now see that it makes sense, at least to me, to think of your tool as being a classifier. Your 'classifier' is a little special in that the feature extraction stage reduces the (high-dimensional) input only one scalar and as such the actual classification stage is reduced to thresholding of that scalar.
    You perfectly nailed it!

    Have you tried other 'features' than the entropy one?
    Well, there are two other methods mentioned in the paper "A Symbiotic Differential Comparison (SDC) Algorithm" and "The Complete Juxtaposation of All Optimization Levels Using Kullback-Leibler Divergence (relative Entropy)".

    Nonetheless, still there are other mathematical formulations which can be used based on clustering and classifications. However, because of the one dimensional scalar that is, the byte representation only, it becomes very hard to draw any meaningful generalizations without making lots of exceptions. I've tried!

    If I to take the semantic meaning of the distribution, that is the assembly instructions, or to work on the actual generated listing instead of the byte distribution, then a lot of things can be done. We've brainstormed some potential ideas in this direction!
    The only reason for time is so that everything doesn't happen at once. [A. Einstein]

  14. #14

    Entyzer+ Goes Geometry Build v0.3

    Entyzer+ - Revision History
    ===========================

    Version 0.3 {Geometry Build:030711}
    -----------

    [?] Released on (July 03, 2011).
    [?] Major update with 7 new generic features related to distance metrics.
    [+] Added 'Rolling XOR' (... -hext: -rxor) for performing xor with
    various key sizes.
    [+] Added various mathematical distance metrics (-h:stat):
    (check 'Help.html' for more info.)
    [+] Simpson's Index
    [+] Canberra's Distance
    [+] Sorensen's Distance
    [+] Minkowski's Distance of Order, Lambda = 3
    [+] Manhattan's Distance, Lambda = 1
    [+] Pearson's Test-Statistic (Chi-Square Test).
    [!] Fixed the Entropy range option miscalculation in the size to be covered.
    [!] Fixed a bug in the case of '-hext' '-b' option. When the (End Offset <
    Start Offset) it was throwing an uncaught exception. Instead, an error
    message is issued![*] Some other minor architectural improvements and features clarifications.


    Entyzer+_v0.3_Geometry_Build_030711.rar
    The only reason for time is so that everything doesn't happen at once. [A. Einstein]

Similar Threads

  1. Entropy visualization utilities for packed malware?
    By Kayaker in forum Malware Analysis and Unpacking Forum
    Replies: 8
    Last Post: September 30th, 2009, 12:01
  2. Performance Analyzer ?
    By corpusfugit in forum OllyDbg Support Forums
    Replies: 3
    Last Post: November 28th, 2008, 03:20
  3. Reducing the Effective Entropy of GS Cookies
    By Uninformed Journal in forum Blogs Forum
    Replies: 0
    Last Post: October 22nd, 2007, 12:22
  4. Entropy
    By naides in forum Advanced Reversing and Programming
    Replies: 1
    Last Post: July 25th, 2007, 22:02
  5. Delphi App PEiD Entropy : 7.21 (Packed)
    By shadowcrack in forum Malware Analysis and Unpacking Forum
    Replies: 10
    Last Post: June 27th, 2005, 16:34

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •