Welcome to the new Woodmann RCE Messageboards Regroupment
Please be patient while the rest of the site is restored.

To all Members of the old RCE Forums:
In order to log in, it will be necessary to reset your forum login password ("I forgot my password") using the original email address you registered with. You will be sent an email with a link to reset your password for that member account.

The old vBulletin forum was converted to phpBB format, requiring the passwords to be reset. If this is a problem for some because of a forgotten email address, please feel free to re-register with a new username. We are happy to welcome old and new members back to the forums! Thanks.

All new accounts are manually activated before you can post. Any questions can be PM'ed to Kayaker.

vm for the masses - a vm compiler incl source

A classroom run by newbies for newbies. Gain valuable reversing experience & skills as we explain the in's and out's of RCE.
NeOXOeN
Member
Posts: 95
Joined: Sun Feb 05, 2006 9:33 pm

Post by NeOXOeN »

i think its one or rare source which came to public and are reaLLY great...

thx again ..
b3n
Posts: 27
Joined: Wed Mar 21, 2007 5:17 am
Location: Australia
Contact:

Post by b3n »

do i get this right that coco only generates you the parser and scanner but you have to write the compiler yourself?

from what i understand so far is that coco is run on a language to produce some sort of output. is the coco output already the code that gets executed by the virtual machine or is it processed further in to create a virtual machine byte code?

im a bit lost here (even after having a look at the sources), so maybe someone can point me in the right direction.
-------
nothing
-------
0rp
Posts: 111
Joined: Wed Mar 03, 2004 12:47 pm

Post by 0rp »

coco generates the sourcecode of the used compiler, it is configured by the grammarfile xm.atg

so basically i dont write the compilersource myself, i just make a config file for coco. based on this config, coco generates the sources for the compiler wich are used then
b3n
Posts: 27
Joined: Wed Mar 21, 2007 5:17 am
Location: Australia
Contact:

Post by b3n »

so the compiler generated by coco transforms your instructions into this for example:
00000000 mov temp_0000, 0
00000001 mov i, temp_0000
00000002 mov temp_0000, i
00000003 mov_data temp_0000, src
00000004 mov temp_0001, 0
00000005 not_equal temp_0000, temp_0001
(taken from your strcpy snippet in the bigpicture.txt)

and this is then executed by the vm? or is it processed further to some sort of binary code? which of the method in the packages is actually executing the instructions?
-------
nothing
-------
Silver
Posts: 570
Joined: Thu May 06, 2004 11:48 am

Post by Silver »

b3n, I don't think following 0rp's code is going to help you with what you want. It might actually make it harder to understand.

0rp, no reflection on your code, just that b3n and I had quite a detailed discussion about VMs via privmsg.
Still here...
b3n
Posts: 27
Joined: Wed Mar 21, 2007 5:17 am
Location: Australia
Contact:

Post by b3n »

hi silver,
its not really concerned with what we talked about, i just want to get an understanding on how 0rp's code works and i couldnt figure that out yet.
-------
nothing
-------
0rp
Posts: 111
Joined: Wed Mar 03, 2004 12:47 pm

Post by 0rp »

lets assume you have this expression:

1 + 2 * 3



the coco-generated compiler (aka frontend), transforms this expression into:

Code: Select all

00000000    mov temp_0000, 1
00000001    mov temp_0001, 2
00000002    mov temp_0002, 3
00000003    mul temp_0001, temp_0002
00000004    add temp_0000, temp_0001
(if you prefer stackmachines, this code is identical to

Code: Select all

push 1
push 2
push 3
mul
add
actually the first xm generation was a stackmachine)




this frontend code is given to the backend, wich transforms it into real down to the metal vm-instructions:

Code: Select all

  00000000    mov temp_0000, 1
  ---------------------------------------------------------
  10    00000126    MOV_TEMP_CONST      00000064,  00000001
  11    00000ca0    MOV_TEMP_CONST      00000078,  00000000
  12    0000020a    ADD                 00000078,  00000008
  13    00000ce5    MOV_MEM_TEMP        00000078,  00000064




  00000001    mov temp_0001, 2
  ---------------------------------------------------------
  14    000003f1    MOV_TEMP_CONST      00000064,  00000002
  15    00000944    MOV_TEMP_CONST      00000078,  00000004
  16    0000074c    ADD                 00000078,  00000008
  17    0000031a    MOV_MEM_TEMP        00000078,  00000064



  00000002    mov temp_0002, 3
  ---------------------------------------------------------
  18    00000f62    MOV_TEMP_CONST      00000064,  00000003
  19    00000d2e    MOV_TEMP_CONST      00000078,  00000008
  1a    000008fd    ADD                 00000078,  00000008
  1b    00000ff0    MOV_MEM_TEMP        00000078,  00000064






  mul temp_0001, temp_0002
  ---------------------------------------------------------
  1c    00001187    MOV_TEMP_CONST      00000078,  00000004
  1d    000011cc    ADD                 00000078,  00000008
  1e    00000d73    MOV_TEMP_MEM        00000064,  00000078
  1f    0000125a    MOV_TEMP_CONST      00000078,  00000008
  20    00000c59    ADD                 00000078,  00000008
  21    00000e46    MOV_TEMP_MEM        00000068,  00000078
  22    0000081f    MUL                 00000064,  00000068
  23    0000004f    MOV_TEMP_CONST      00000078,  00000004
  24    00000a62    ADD                 00000078,  00000008
  25    00000a19    MOV_MEM_TEMP        00000078,  00000064




  add temp_0000, temp_0001
  ---------------------------------------------------------
  26    000012b3    MOV_TEMP_CONST      00000078,  00000000
  27    00000989    ADD                 00000078,  00000008
  28    000003a8    MOV_TEMP_MEM        00000064,  00000078
  29    00000507    MOV_TEMP_CONST      00000078,  00000004
  2a    00000f1b    ADD                 00000078,  00000008
  2b    00000094    MOV_TEMP_MEM        00000068,  00000078
  2c    000012f8    ADD                 00000064,  00000068
  2d    00000ed6    MOV_TEMP_CONST      00000078,  00000000
  2e    0000066c    ADD                 00000078,  00000008
  2f    000009d0    MOV_MEM_TEMP        00000078,  00000064
(first the frontend instruction, following the required vm instructions)

as you can see, there are a lot of vm instructions required to do one frontendinstruction (i.e. add temp, temp requires 10 vm instructions)
b3n
Posts: 27
Joined: Wed Mar 21, 2007 5:17 am
Location: Australia
Contact:

Post by b3n »

thanks for that explanation 0rp, that made it a lot clearer. im currently still digging through the code commenting as much as i can. but i havent found the method that is doing the execution of the vm instructions yet. where is the generated backend code executed? or is the backend code generated and executed on the fly when the frontend instructions are read?

edit:
am i right if i assume the following snipped of vm code would translate to the instructions shown below?


10 00000126 MOV_TEMP_CONST 00000064, 00000001
11 00000ca0 MOV_TEMP_CONST 00000078, 00000000
12 0000020a ADD 00000078, 00000008
13 00000ce5 MOV_MEM_TEMP 00000078, 00000064


mov dword [ebx+0xededed00], 0xededed01
mov dword [ebx+0xededed00], 0xededed01
mov eax, [ebx+0xededed01]
add [ebx+0xededed00], eax
mov eax, [ebx+0xededed01]
mov ecx, [ebx+0xededed00]
mov [ecx], eax

im dont know what 0xededed00 and 0xededed01 are used for, could you please explain that to me?

[--MOV_TEMP_CONST--]
//initialize temp reg with 1 (ebx+0xededed00 points to the first temp reg?)
//is 00000064 in ebx?
mov dword [ebx+0xededed00], 0xededed01
[--END MOV_TEMP_CONST--]

[--MOV_TEMP_CONST--]
//same as above, initialize second temp reg with 0
mov dword [ebx+0xededed00], 0xededed01
[--END MOV_TEMP_CONST--]

[--ADD--]
//move value of temp reg 2 into eax
mov eax, [ebx+0xededed01]

//probably add the value in eax to the first temp reg, but im not sure what
//the 00000008 in the vm code stands for
add [ebx+0xededed00], eax
[--END ADD--]

[--MOV_MEM_TEMP--]
//move value of second temp reg into eax
mov eax, [ebx+0xededed01]

//move address of first temp reg in ecx
mov ecx, [ebx+0xededed00]

//save eax at address of first temp reg
mov [ecx], eax
[--END MOV_MEM_TEMP--]
-------
nothing
-------
0rp
Posts: 111
Joined: Wed Mar 03, 2004 12:47 pm

Post by 0rp »

the instructions itself are executable, when the vm is entered, it goes straight to the first opcode, this opcode knows who is next and jumps to it, and so on

this edededXX stuff are markers. i compile the opcode source into .bin and overwrite the edededXX markers with their real values (done in void Backend::writeParam)

example:

ADD TEMP_0064, TEMP_0078

add opcode source:
mov eax, [ebx+0xededed01]
add [ebx+0xededed00], eax

wich gets:
mov eax, [ebx+0x78]
add [ebx+0x64], eax


so 0xedededed01 (the source operand) is replaced with 0x78 during generation, and 0xededed00 (the dest) is replaced by 0x64



and you are right with your example of those 4 instructions and their real asm
b3n
Posts: 27
Joined: Wed Mar 21, 2007 5:17 am
Location: Australia
Contact:

Post by b3n »

thanks 0rp!

so do i get this right:
1. you let the compiler generate the vm instruction from the input script
2. the vm runs over this script and executes the matching instructions

so:
ADD TEMP_0064, TEMP_0078
will be executed by the vm like:
1. find out instruction (in this case add)
2. look up the compiled opcode
3. patch the 0xebebeb00 and 0xebebebe01 markers
4. execute the opcode instructions
5. get next instruction

did i get this right?
-------
nothing
-------
0rp
Posts: 111
Joined: Wed Mar 03, 2004 12:47 pm

Post by 0rp »

this replacement of edededXX is done while generation, not while execution

so, when generation is done, you have a big block of x86 executable code, that make up the single steps, so somewhere it will contain
mov eax, [ebx+0x78]
add [ebx+0x64], eax
which was required for something



here is how the final generation result looks like without encryption:

mov temp64, 1:
0049E845 mov dword ptr [ebx+64h],1
0049E84F mov ecx,4FCh
0049E854 mov edx,19h
0049E859 add ecx,dword ptr [ebx+2Ch]
0049E85C jmp ecx



mov temp_78, 0:
0049ECDC mov dword ptr [ebx+78h],0
0049ECE6 mov ecx,0C8h
0049ECEB mov edx,1Bh
0049ECF0 add ecx,dword ptr [ebx+2Ch]
0049ECF3 jmp ecx



add temp_78, temp_8
0049E8A8 mov eax,dword ptr [ebx+8]
0049E8AE add dword ptr [ebx+78h],eax
0049E8B4 mov ecx,516h
0049E8B9 mov edx,1Dh
0049E8BE add ecx,dword ptr [ebx+2Ch]
0049E8C1 jmp ecx



so the vm instructions end up as a chain of small executable and customized (the edededXX markers are replaced) x86 blocks, that are chained
b3n
Posts: 27
Joined: Wed Mar 21, 2007 5:17 am
Location: Australia
Contact:

Post by b3n »

i see, so the compiled opcode snippets are just small templates of code that get customized by the vm environment and put together to form the final program? the way is see it the backend is kind of a compiler too, which produces the final binary as output. the final program is then run by executing the first instruction in the instruction chain?
-------
nothing
-------
0rp
Posts: 111
Joined: Wed Mar 03, 2004 12:47 pm

Post by 0rp »

yes, exactly :)
b3n
Posts: 27
Joined: Wed Mar 21, 2007 5:17 am
Location: Australia
Contact:

Post by b3n »

why did you decide to create a final binary version of the input program instead of letting the vm execute the vm instructions during runtime as kind of an interpreter? if you have a binary version of the input program, what do you need the vm for? (maybe i missed something on the way but thats what i ask myself)
-------
nothing
-------
0rp
Posts: 111
Joined: Wed Mar 03, 2004 12:47 pm

Post by 0rp »

there was a xm version, that was working like you suggested

it had a static number of generic opcodes (add, mov, mul,...) that were parameterized. thatfor the vm contained also a big parameterstream

i didnt like this idea too much, bc you can easy replace the static number of opcodes by own hacked opcodes and do whatever you want
Locked