PDA

View Full Version : How to pass the obfuscated program's trace protocol through compiler-optimizer?


Cristianu
April 9th, 2012, 11:22
With help of GDB-script:

file ./program
b *0x12345
run
while 1
x/i $pc
ni
end
quit



I got a trace protocol of obfuscated program.


...

0x484e0: bx lr
?? ()
0x43d88: b 0x43db8
?? ()
0x43db8: ldr r3, [r11, #-16]
?? ()
0x43dbc: mov r0, r3
?? ()
0x43dc0: sub sp, r11, #12
?? ()
0x43dc4: pop {r4, r5, r11, pc}
?? ()
0x3fb94: ldr r3, [r11, #-8]
?? ()
0x3fb98: mov r0, r3
?? ()
0x3fb9c: sub sp, r11, #4
?? ()
0x3fba0: pop {r11, pc}
?? ()
0x3da68: ldr r3, [r11, #-8]

...


Kris Kaspersky writes, that it is good idea to pass the tracer's protocol through compiler-optimizer for better understanding of this program. In such case I will get the same executable file with more readable disassembled code. But I haven't any idea what compilers and in what way should I use.

P.S. What should I do to get rid of unnecessary lines: "?? ()" ? And what should I do to redirect GDB's out to file?

OHPen
April 13th, 2012, 03:01
Hi Cristianu,

I don't want to offend you but you are obviously not aware of what you talking about. There is no such too which would allow you take a gdb trace log, paste it into a file and let the file be processes by tool which does compiler optimization.
Kris Kaspersky is theoretically talking about what technologies could be used to get a proper deobfusctor by "misusing" compiler optimization algorithms, although i doubt, that he has more than a POC

If you really want to do something like that keep in mind that you will have to write your own tools, there is no way around this. Nowadays most people are using available frameworks for task like this and the most used one is the llvm project. But be prepared to study that stuff the next 1 year at least ;D

Another possible, but in my opinion not so professional, approach would be to write your own deobfusctor which is processing on the text output of gdb. Have a look at the blogs here you go back one or two years. I think you will find a project which did something like you want to deobfuscate obfuscated virtual machine handlers. the difference here is simply that the guy who wrote the deobfucator dealt with x86 code instead of arm.

If you we are talking about only a few hundered lines of code you could also use piece of paper and pencil!

Nevertheless all of this will end up in a long project!

blabberer
April 13th, 2012, 15:00
i dont find google showing me where kris kaspersky is talking about inputing a raw disassembly to some compiler optimizer and getting back super disassembly

so i refrained from replying earlier

since ohpen has burst the bubble i too would chime in and say there doesnt exist a method that would get you a almost reassembleable disassembly from obfuscated disassembly

yes many individual efforts exist and afaik they are x86 primarily and they all still have a long long way to go to be declared near perfect

anyway ill answer the minor questions leaving the compiler optimization whatever part for X86
convert it to arm

fisrt as to redirect output to file

if your gdb is newer version

you can do set logging on and provide a file name

if you have a linux like my DAMN SMALL LINUX running low ram in vm on windows host

where the gdb package thats i available is old and does not have set logging command

you can use the following method


Code:


dsl@box:~$ cat helloworld.c
#include<stdio.h>

int main (void){

printf("hi Damn Small Linux This is My First Proggie\n";
return 0;
}


dsl@box:~$dsl@box:~$ gcc helloworld.c -o cristianu


dsl@box:~$ ./cristianu
hi Damn Small Linux This is My First Proggie
dsl@box:~$



dsl@box:~$ cat foo
file ./cristianu
set disassembly-flavor intel
set annotate 0
set max-symbolic-offset 0
set print address off
set complaints 0
b main
run
while 1
x/i $pc
ni
end
quit
dsl@box:~$

dsl@box:~$ gdb -q < foo > cristlog >&1
gdb: Symbol `emacs_ctlx_keymap' has different size in shared object, consider re-linking
No symbol table is loaded. Use the "file" command.
No registers.
dsl@box:~$




dsl@box:~$ cat cristlog
(gdb) Reading symbols from ./cristianu...(no debugging symbols found)...done.
(gdb) (gdb) (gdb) (gdb) (gdb) (gdb) Breakpoint 1 at 0x804838a
(gdb) Starting program: /home/dsl/cristianu
(no debugging symbols found)...(no debugging symbols found)...
Breakpoint 1, main ()
(gdb) > > >0x804838a <main+6>: and esp,0xfffffff0
main ()
0x804838d <main+9>: mov eax,0x0
main ()
0x8048392 <main+14>: sub esp,eax
main ()
0x8048394 <main+16>: mov DWORD PTR [esp],0x80484e0
main ()
0x804839b <main+23>: call 0x80482b0 <_init+56>
main ()
0x80483a0 <main+28>: mov eax,0x0
main ()
0x80483a5 <main+33>: leave
main ()
0x80483a6 <main+34>: ret
__libc_start_main () from /lib/libc.so.6
0x4002ee3e <__libc_start_main+206>: mov DWORD PTR [esp],eax
__libc_start_main () from /lib/libc.so.6
0x4002ee41 <__libc_start_main+209>: call 0x40044a30 <exit>
hi Damn Small Linux This is My First Proggie

Program exited normally.
(gdb) dsl@box:~$

dsl@box:~$ grep -i main+ cristlog > cristasm
dsl@box:~$ cat crist
cristasm cristianu cristlog
dsl@box:~$ cat cristasm
(gdb) > > >0x804838a <main+6>: and esp,0xfffffff0
0x804838d <main+9>: mov eax,0x0
0x8048392 <main+14>: sub esp,eax
0x8048394 <main+16>: mov DWORD PTR [esp],0x80484e0
0x804839b <main+23>: call 0x80482b0 <_init+56>
0x80483a0 <main+28>: mov eax,0x0
0x80483a5 <main+33>: leave
0x80483a6 <main+34>: ret
0x4002ee3e <__libc_start_main+206>: mov DWORD PTR [esp],eax
0x4002ee41 <__libc_start_main+209>: call 0x40044a30 <exit>
dsl@box:~$


dsl@box:~$ sed s/.*main+.*:.//g cristasm > prettycristasm
dsl@box:~$ cat prettycristasm
and esp,0xfffffff0
mov eax,0x0
sub esp,eax
mov DWORD PTR [esp],0x80484e0
call 0x80482b0 <_init+56>
mov eax,0x0
leave
ret
mov DWORD PTR [esp],eax
call 0x40044a30 <exit>
dsl@box:~$

Cristianu
April 14th, 2012, 13:18
Ok, thank you for responses!
Now it is clear that developing of deobfuscation tools is rather the thing of the near future.
Nevertheless, trace list gives us some benefits - we have a real sequence of executed instructions.
It is possible to write gdb script that outputs instructions with current registers state.
Then it is possible to find necessary value with help of Ctrl+F.

Quote:

blabberer

i dont find google showing me where kris kaspersky is talking about inputing a raw disassembly to some compiler optimizer and getting back super disassembly

It is not surprisingly
He wrote about it in his Russian book "Art of disassembling" (it is literal translation).

Could you advice me some books to improve my skills in reverse engeneering (ARM-oriented books are preferable).
I would like to become a superhacker.
What should I research? Compilers, cryptography, deobfuscation theory, what else?
Any help would be appreciated.

blabberer
April 15th, 2012, 01:36
i dont know about arm never had the necessity to hack arm but i believe i would be able to hack it if i put my head down to it in a few sessions

as basics are what must be solid and not implementation details x86 is an implementation like arm is what i think

anyway adopting kiss principle (keep it simple and <......> (sir,stupid sir,straightforward sir,shitty sir,s......sir)

i would go about like this

grab a simple crackme

find ways to run it as in installing os , framework , etc etc

when it runs find ways to open it raw and visually look at its guts ie using any text readers . binary readers

then put it in a comatose state and look at its guts sequentially ie using debuggers . disassemblers , descripters, dewhateverss

when i am comfortable with its inner workings (as in i can say in my dreams what ldr r3 #somereg, r18 would mean in any context)

i would start poking into its interaction with the os / framework / vm ( ie a few round trips into R0 as they would say in x86)

and hence forth simply try trapping every thing in R0 where simple r3 obfuscations wont matter

hope i live upto my nicks real meaning

Darkelf
April 15th, 2012, 10:24
Quote:
[Originally Posted by Cristianu;92301]
Could you advice me some books to improve my skills in reverse engeneering (ARM-oriented books are preferable).


Well, there is Steve Furber's book widely known as "the ARM bible":
http://www.amazon.co.uk/exec/obidos/ASIN/0201675196/202-6448309-1571011

and you can start here:
http://www.ee.ic.ac.uk/pcheung/teaching/ee2_computing/

Have fun.

Regards
darkelf

Cristianu
April 18th, 2012, 10:50
Thank you for responses, guys!
It was very usefull discussion.
Good luck for everyone!

disavowed
April 22nd, 2012, 08:44
In response to your original question, one idea would be as follows:

Take your original disassembly and create a C program with it (using __asm__).
Compile the C program into a binary.
Use Hex-Rays on the binary to decompile that program.
Now take the decompilation, and create a new C program with that decompiled C code.
Disassemble that new program, and you should have your "optimized" disassembly.

Cristianu
April 22nd, 2012, 11:50
disavowed
Great reply!
It is just what I need!
Thank you.

Cristianu
April 24th, 2012, 03:59
I've just tried to optimize this example in such way:

Code:

#include <stdio.h>

int main(int argc,char** argv) {

__asm__ ( "movl $10, %eax;"
"movl $10, %eax;"
"movl $10, %eax;"
"movl $10, %eax;"
"movl $20, %ebx;"
"addl %ebx, %eax;"
);

}


I tried -O3 -O2 -O1 - result is the same:

Code:
<main>
"movl $10, %eax;"
"movl $10, %eax;"
"movl $10, %eax;"
"movl $10, %eax;"
"movl $20, %ebx;"
"addl %ebx, %eax;"
...


What is wrong?
I guess, optimization of compiler should delete the first three lines
Code:
"movl $10, %eax;"

Darkelf
April 24th, 2012, 11:42
Whoohoo, that was a quick jump away from ARM, wasn't it?

Now, x86 coding is on the menu, right?
OK, to make it short and sweet here is a little quote:

Quote:

The presence of an __asm block affects optimization in several ways. First, the compiler doesn't try to optimize the __asm block itself. What you write in assembly language is exactly what you get.


How should the compiler know, what you are trying to do? When you use inline asm you are on your own. In general, the use of inline asm is discouraged, because it get's in the compilers way and prevents an overall optimization. So if you use it, you are expected to know what you are doing.

Best regards
darkelf

Cristianu
April 25th, 2012, 02:34
Code:
Whoohoo, that was a quick jump away from ARM, wasn't it?

Code:
Now, x86 coding is on the menu, right?

It was just a test.
If it doesn't work with x86 - it doesn't work wirh ARM.

Am I right?
ARM coding is still on the menu.

Best regards
Cristianu

disavowed
April 29th, 2012, 09:45
The optimizing C compiler optimizes C code. It doesn't optimize inline assembly.