Results 1 to 5 of 5

Thread: Question about why a compiler does this sometimes

  1. #1
    Technomancer
    Guest

    Question about why a compiler does this sometimes

    I was trying to understand the deadlisting of a program yesterday and i saw something like this :

    Code:
    :423864 88442404		mov byte ptr [esp+04], al
    :423868 8D442404		lea eax, dword ptr [esp+04]
    :42386C 50		push eax //first argument
    :42386D BB05000000	mov ebx, 00000005 //second argument
    :423872 66894C2409	mov word ptr [esp+09], cx 
    :423877 668954240B	mov word ptr [esp+0B], dx 
    :42387C E81CCF0000	call 43079D
    After analysing it, i realise this can be emulated to something as simple as :
    Code:
    mov eax, someptr ;where someptr is an addy pointing to whats in al,cx & dx
    push eax //first argument
    mov ebx,00000005 //second argument
    call 43079D
    Why did the compiler went through so much trouble to assign the value in al, cx and dx in consecutive address/memory location in the stack when it could be optimised to a much easier solution like mine. I thought ... compilers are supposed to be perfect ?!!
    I promise that I have read the FAQ and tried to use the Search to answer my question.

  2. #2
    I guess this is MiniGW C compiler I saw similar code generated by it.

  3. #3
    Naides is Nobody
    Join Date
    Jan 2002
    Location
    Planet Earth
    Posts
    1,647
    I am sure LLXX will give you a more expert answer to your question,
    but the short answer is compilers do somethings that appear crazy to human eyes.
    If you have 3 or four days to spare, take a look at the book "Hacking disassembly uncovered" by Kris Kaspersky. A lot of examples of why compilers do what they do. . .

    In this particular example, I guess that if you looked at the High level source code, [ESP+04], [ESP+09] and [ESP+0B] each represent a different local variable.
    While it is obvious to your human eye that they end up being contiguous in memory, the compiler may not be "smart" enough to notice that, because this is not a general enough case. In other situations that the compiler has to deal with, those varibles may endup scattered from one another, so the pointer at EAX may not always define the locations of all the local variables involved. . . remember also that this compiler is keeping track of the stack with ESP, which will continue to change when you call, push or pop stuff into the stack, so it may need extra milestones to keep track of the location of the variables.

    If you had (Do you have?) the source code, things may look a little clearer.
    Last edited by naides; June 5th, 2006 at 11:14.

  4. #4
    compilers are supposed to be perfect ?!!
    There is something you should know:
    There is NO Perfect Optimizer...
    In real,Compilers use a very complicated algorithms to optimize intermediate code that is generated,But this optimization is never perfect.
    As I saw,MS VC++ optimizer works better than others but it is not perfect too.
    It is obvoius that you can generate a better optimized code with your brain,because it works surely better than computer in this issues..

    sincerely yours
    I should look out my posts,Or JMI will get mad at me! ;)

  5. #5
    It's just initialising some local variables in the lines 423864, 423872, and 423877... you have not "simplified" the code, just omitted those initialisations and focused on the function call.

    Code:
    mov eax, someptr ;where someptr is an addy pointing to whats in al,cx & dx
    Remember that "someptr" changes depending on the condition of the stack, thus LEA instruction is required to compute the correct address. (As an aside, encountering many LEAs in the code for address calculations, along with many [esp+xxxx], is a sure sign of compiler-generated code.)

    From the initialisations, it looks like the first local variable is only a char, while the second and third are short. This compiler has an algorithm to "interleave" instructions to improve performance - if you read the Intel optimisation manual, you'll understand that
    Code:
    mov [1234], ax
    mov [5678], ax
    Is slower than
    Code:
    mov [1234], ax
    mov [5678], bx
    When AX and BX are identical. This is primarily due to the fact that the processor can execute several instructions simultaneously whenever it can. In this instance, the mov ebx, and the two initialisations are probably going to execute at the same time.

    No compiler is perfect. No brain is perfect.

    Compilers do not have any intelligence. They simply transform source code into binary using predefined patterns, which may include those that optimise the output. They are essentially (very complicated) deterministic finite-state automata. This is why decompilation into equivalent source code is possible.

    Human brains, however, have a creative aspect and are able to use heuristics and reasoning to optimise code, and while the results may not be deterministic (nor correct sometimes ), it is this extra creativity that enables us to see what the compiler cannot. We can change to a different algorithm or data structure with the hopes of increasing efficiency, while a compiler cannot actually comprehend the purpose of the code, only its function.

    While on the topic of efficiency, one only has to take a look at the many size-optimisation competitions to see the huge advantage that carefully written Asm has over HLLs like C/C++.
    Last edited by LLXX; June 5th, 2006 at 22:26.

Similar Threads

  1. Analysis of compiler infector Induc
    By dragula in forum Malware Analysis and Unpacking Forum
    Replies: 6
    Last Post: February 8th, 2010, 15:00
  2. weird msvc++ compiler behavior
    By roxaz in forum The Newbie Forum
    Replies: 8
    Last Post: August 21st, 2008, 16:57
  3. Understanding something about why a compiler does this
    By Technomancer in forum The Newbie Forum
    Replies: 15
    Last Post: May 19th, 2006, 05:39
  4. looking for a VB3 compiler
    By 0ffs3t in forum The Newbie Forum
    Replies: 10
    Last Post: October 31st, 2002, 12:31
  5. InstallSjield compiler
    By karakochev in forum Advanced Reversing and Programming
    Replies: 11
    Last Post: December 9th, 2001, 06:52

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •