Results 1 to 13 of 13

Thread: How to pass the obfuscated program's trace protocol through compiler-optimizer?

  1. #1

    How to pass the obfuscated program's trace protocol through compiler-optimizer?

    With help of GDB-script:

    file ./program
    b *0x12345
    run
    while 1
    x/i $pc
    ni
    end
    quit



    I got a trace protocol of obfuscated program.


    ...

    0x484e0: bx lr
    ?? ()
    0x43d88: b 0x43db8
    ?? ()
    0x43db8: ldr r3, [r11, #-16]
    ?? ()
    0x43dbc: mov r0, r3
    ?? ()
    0x43dc0: sub sp, r11, #12
    ?? ()
    0x43dc4: pop {r4, r5, r11, pc}
    ?? ()
    0x3fb94: ldr r3, [r11, #-8]
    ?? ()
    0x3fb98: mov r0, r3
    ?? ()
    0x3fb9c: sub sp, r11, #4
    ?? ()
    0x3fba0: pop {r11, pc}
    ?? ()
    0x3da68: ldr r3, [r11, #-8]

    ...


    Kris Kaspersky writes, that it is good idea to pass the tracer's protocol through compiler-optimizer for better understanding of this program. In such case I will get the same executable file with more readable disassembled code. But I haven't any idea what compilers and in what way should I use.

    P.S. What should I do to get rid of unnecessary lines: "?? ()" ? And what should I do to redirect GDB's out to file?

  2. #2
    ::[ Reverse Engineer ]:: OHPen's Avatar
    Join Date
    Nov 2002
    Location
    .text
    Posts
    399
    Blog Entries
    5
    Hi Cristianu,

    I don't want to offend you but you are obviously not aware of what you talking about. There is no such too which would allow you take a gdb trace log, paste it into a file and let the file be processes by tool which does compiler optimization.
    Kris Kaspersky is theoretically talking about what technologies could be used to get a proper deobfusctor by "misusing" compiler optimization algorithms, although i doubt, that he has more than a POC

    If you really want to do something like that keep in mind that you will have to write your own tools, there is no way around this. Nowadays most people are using available frameworks for task like this and the most used one is the llvm project. But be prepared to study that stuff the next 1 year at least ;D

    Another possible, but in my opinion not so professional, approach would be to write your own deobfusctor which is processing on the text output of gdb. Have a look at the blogs here you go back one or two years. I think you will find a project which did something like you want to deobfuscate obfuscated virtual machine handlers. the difference here is simply that the guy who wrote the deobfucator dealt with x86 code instead of arm.

    If you we are talking about only a few hundered lines of code you could also use piece of paper and pencil!

    Nevertheless all of this will end up in a long project!
    - Reverse Enginnering can be everything, but sometimes it's more than nothing. Really rare moments but then they appear to last ages... -

  3. #3
    Super Moderator
    Join Date
    Dec 2004
    Posts
    1,456
    Blog Entries
    15
    i dont find google showing me where kris kaspersky is talking about inputing a raw disassembly to some compiler optimizer and getting back super disassembly

    so i refrained from replying earlier

    since ohpen has burst the bubble i too would chime in and say there doesnt exist a method that would get you a almost reassembleable disassembly from obfuscated disassembly

    yes many individual efforts exist and afaik they are x86 primarily and they all still have a long long way to go to be declared near perfect

    anyway ill answer the minor questions leaving the compiler optimization whatever part for X86
    convert it to arm

    fisrt as to redirect output to file

    if your gdb is newer version

    you can do set logging on and provide a file name

    if you have a linux like my DAMN SMALL LINUX running low ram in vm on windows host

    where the gdb package thats i available is old and does not have set logging command

    you can use the following method


    Code:
    dsl@box:~$ cat helloworld.c 
    #include<stdio.h>
    
    int main (void){
    
    printf("hi Damn Small Linux This is My First Proggie\n");
    return 0;
    }
    
    
    dsl@box:~$dsl@box:~$ gcc helloworld.c  -o cristianu
    
    
    dsl@box:~$ ./cristianu 
    hi Damn Small Linux This is My First Proggie
    dsl@box:~$ 
    
    
    
     dsl@box:~$ cat foo
    file ./cristianu
    set disassembly-flavor intel
    set annotate 0
    set max-symbolic-offset 0
    set print address off
    set complaints 0
    b main
    run
    while 1
    x/i $pc
    ni
    end
    quit
    dsl@box:~$ 
    
    dsl@box:~$ gdb -q < foo > cristlog >&1
    gdb: Symbol `emacs_ctlx_keymap' has different size in shared object, consider re-linking
    No symbol table is loaded.  Use the "file" command.
    No registers.
    dsl@box:~$ 
    
    
    
    
    dsl@box:~$ cat cristlog 
    (gdb) Reading symbols from ./cristianu...(no debugging symbols found)...done.
    (gdb) (gdb) (gdb) (gdb) (gdb) (gdb) Breakpoint 1 at 0x804838a
    (gdb) Starting program: /home/dsl/cristianu 
    (no debugging symbols found)...(no debugging symbols found)...
    Breakpoint 1, main ()
    (gdb)  > > >0x804838a <main+6>: and    esp,0xfffffff0
    main ()
    0x804838d <main+9>:     mov    eax,0x0
    main ()
    0x8048392 <main+14>:    sub    esp,eax
    main ()
    0x8048394 <main+16>:    mov    DWORD PTR [esp],0x80484e0
    main ()
    0x804839b <main+23>:    call   0x80482b0 <_init+56>
    main ()
    0x80483a0 <main+28>:    mov    eax,0x0
    main ()
    0x80483a5 <main+33>:    leave  
    main ()
    0x80483a6 <main+34>:    ret    
    __libc_start_main () from /lib/libc.so.6
    0x4002ee3e <__libc_start_main+206>:     mov    DWORD PTR [esp],eax
    __libc_start_main () from /lib/libc.so.6
    0x4002ee41 <__libc_start_main+209>:     call   0x40044a30 <exit>
    hi Damn Small Linux This is My First Proggie
    
    Program exited normally.
    (gdb) dsl@box:~$ 
    
    dsl@box:~$ grep -i main+ cristlog > cristasm
    dsl@box:~$ cat crist  
    cristasm   cristianu  cristlog
    dsl@box:~$ cat cristasm
    (gdb)  > > >0x804838a <main+6>: and    esp,0xfffffff0
    0x804838d <main+9>:     mov    eax,0x0
    0x8048392 <main+14>:    sub    esp,eax
    0x8048394 <main+16>:    mov    DWORD PTR [esp],0x80484e0
    0x804839b <main+23>:    call   0x80482b0 <_init+56>
    0x80483a0 <main+28>:    mov    eax,0x0
    0x80483a5 <main+33>:    leave  
    0x80483a6 <main+34>:    ret    
    0x4002ee3e <__libc_start_main+206>:     mov    DWORD PTR [esp],eax
    0x4002ee41 <__libc_start_main+209>:     call   0x40044a30 <exit>
    dsl@box:~$ 
    
    
    dsl@box:~$ sed s/.*main+.*:.//g cristasm > prettycristasm
    dsl@box:~$ cat prettycristasm 
    and    esp,0xfffffff0
    mov    eax,0x0
    sub    esp,eax
    mov    DWORD PTR [esp],0x80484e0
    call   0x80482b0 <_init+56>
    mov    eax,0x0
    leave  
    ret    
    mov    DWORD PTR [esp],eax
    call   0x40044a30 <exit>
    dsl@box:~$
    Last edited by blabberer; April 13th, 2012 at 15:49. Reason: pasted the content again :) using linux midddle click from shell ctrl+c + ctrl+v is windoze :)

  4. #4
    Ok, thank you for responses!
    Now it is clear that developing of deobfuscation tools is rather the thing of the near future.
    Nevertheless, trace list gives us some benefits - we have a real sequence of executed instructions.
    It is possible to write gdb script that outputs instructions with current registers state.
    Then it is possible to find necessary value with help of Ctrl+F.

    blabberer

    i dont find google showing me where kris kaspersky is talking about inputing a raw disassembly to some compiler optimizer and getting back super disassembly
    It is not surprisingly
    He wrote about it in his Russian book "Art of disassembling" (it is literal translation).

    Could you advice me some books to improve my skills in reverse engeneering (ARM-oriented books are preferable).
    I would like to become a superhacker.
    What should I research? Compilers, cryptography, deobfuscation theory, what else?
    Any help would be appreciated.

  5. #5
    Super Moderator
    Join Date
    Dec 2004
    Posts
    1,456
    Blog Entries
    15
    i dont know about arm never had the necessity to hack arm but i believe i would be able to hack it if i put my head down to it in a few sessions

    as basics are what must be solid and not implementation details x86 is an implementation like arm is what i think

    anyway adopting kiss principle (keep it simple and <......> (sir,stupid sir,straightforward sir,shitty sir,s......sir)

    i would go about like this

    grab a simple crackme

    find ways to run it as in installing os , framework , etc etc

    when it runs find ways to open it raw and visually look at its guts ie using any text readers . binary readers

    then put it in a comatose state and look at its guts sequentially ie using debuggers . disassemblers , descripters, dewhateverss

    when i am comfortable with its inner workings (as in i can say in my dreams what ldr r3 #somereg, r18 would mean in any context)

    i would start poking into its interaction with the os / framework / vm ( ie a few round trips into R0 as they would say in x86)

    and hence forth simply try trapping every thing in R0 where simple r3 obfuscations wont matter

    hope i live upto my nicks real meaning

  6. #6
    Quote Originally Posted by Cristianu View Post
    Could you advice me some books to improve my skills in reverse engeneering (ARM-oriented books are preferable).
    Well, there is Steve Furber's book widely known as "the ARM bible":
    http://www.amazon.co.uk/exec/obidos/ASIN/0201675196/202-6448309-1571011

    and you can start here:
    http://www.ee.ic.ac.uk/pcheung/teaching/ee2_computing/

    Have fun.

    Regards
    darkelf
    I flout Chuck Norris, Spongebob barbecues underwater!

  7. #7
    Thank you for responses, guys!
    It was very usefull discussion.
    Good luck for everyone!

  8. #8
    <script>alert(0)</script> disavowed's Avatar
    Join Date
    Apr 2002
    Posts
    1,281
    In response to your original question, one idea would be as follows:
    1. Take your original disassembly and create a C program with it (using __asm__).
    2. Compile the C program into a binary.
    3. Use Hex-Rays on the binary to decompile that program.
    4. Now take the decompilation, and create a new C program with that decompiled C code.
    5. Disassemble that new program, and you should have your "optimized" disassembly.

  9. #9
    disavowed
    Great reply!
    It is just what I need!
    Thank you.

  10. #10

    Thumbs up

    I've just tried to optimize this example in such way:

    Code:
    #include <stdio.h>
    
    int main(int argc,char** argv) {
    
    __asm__ (       "movl $10, %eax;"
    		"movl $10, %eax;"
    		"movl $10, %eax;"
    		"movl $10, %eax;"
                    "movl $20, %ebx;"
                    "addl %ebx, %eax;"
        );	
    
    }
    I tried -O3 -O2 -O1 - result is the same:

    Code:
    <main>	
    	"movl $10, %eax;"
    	"movl $10, %eax;"
    	"movl $10, %eax;"
    	"movl $10, %eax;"
            "movl $20, %ebx;"
            "addl %ebx, %eax;"
    	        ...
    What is wrong?
    I guess, optimization of compiler should delete the first three lines
    Code:
    "movl $10, %eax;"

  11. #11
    Whoohoo, that was a quick jump away from ARM, wasn't it?

    Now, x86 coding is on the menu, right?
    OK, to make it short and sweet here is a little quote:

    The presence of an __asm block affects optimization in several ways. First, the compiler doesn't try to optimize the __asm block itself. What you write in assembly language is exactly what you get.
    How should the compiler know, what you are trying to do? When you use inline asm you are on your own. In general, the use of inline asm is discouraged, because it get's in the compilers way and prevents an overall optimization. So if you use it, you are expected to know what you are doing.

    Best regards
    darkelf
    I flout Chuck Norris, Spongebob barbecues underwater!

  12. #12
    Code:
    Whoohoo, that was a quick jump away from ARM, wasn't it?
    Code:
    Now, x86 coding is on the menu, right?
    It was just a test.
    If it doesn't work with x86 - it doesn't work wirh ARM.

    Am I right?
    ARM coding is still on the menu.

    Best regards
    Cristianu

  13. #13
    <script>alert(0)</script> disavowed's Avatar
    Join Date
    Apr 2002
    Posts
    1,281
    The optimizing C compiler optimizes C code. It doesn't optimize inline assembly.

Similar Threads

  1. OllyDbg problem, hit trace and run trace features dissapeared.
    By cap232 in forum OllyDbg Support Forums
    Replies: 0
    Last Post: August 7th, 2009, 01:26
  2. Need obfuscated .NET assemblies
    By Daniel Pistelli in forum Advanced Reversing and Programming
    Replies: 0
    Last Post: March 1st, 2009, 05:12
  3. Replies: 0
    Last Post: January 12th, 2008, 00:08
  4. Sentinel SuperPro protocol question
    By MA700C in forum The Newbie Forum
    Replies: 1
    Last Post: April 12th, 2006, 05:22
  5. How-to log port IO protocol?
    By aba in forum The Newbie Forum
    Replies: 2
    Last Post: December 20th, 2004, 08:19

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •