The term P-code is neither new nor a Microsoft invention, P-code is simply code interpreted during execution time. So we understand each other, without using complex vocabulary, P- code could be thought as generic machine-level code that our microprocessor cannot interpret by itself, and requires a previous translation to its native machine-code. Somewhat similar to compiled JAVA. In order to execute JAVA-written applications we need a so-called virtual machine. Such a fancy term only means a translator placed between the JAVA-code and the code that our processor understands.
The advantages of P-code usage are obvious. If we define a proprietary set of instructions and do not publish its specifications, people are going to have a hard time understanding our code. Another advantage is the reduction of the executable code size: by defining a particular op-code the size of a single byte, we can make such instruction execute a series of operations that would take a larger number of instructions in native code. Microsofts Visual Basic P-code is exactly that, a virtual machine translating p-code to our processors native machine-code. This virtual Machine resides in a DLL loaded by the executable before being interpreted. Like most of you would have deduced, the names of these DLL are :
The file name is quite explicit Microsoft Visual Basic Virtual Machine, followed by the version. The differences between these two versions are few: Version 6 introduces new instructions and uses more intuitive names to the instructions contained in Version 5. In other words, Version 6 changes the names but not the intrinsic mechanics of the functions.
The Virtual Machine not only interprets Visual Basic p-code files but it is also used by executables compiled in native machine code. This is because the DLLs also contain the APIs used by all VB applications. An example would be rtcMsgBox, which many of you know is used as an equivalent to the standard Windows API MessageBox. These functions are used in the same way by p-code instructions, but in an indirect fashion, through code interpreted by the virtual machine.
This situation results in a serious problem when we have to trace through p-code:
SoftICE cannot trace through p-code, what we end up doing is tracing the virtual machine code. To elaborate, SoftICE only understands the processors native machine code and knows nothing of P-code: In fact all we will se if we attempt to trace P-code will be the translation of the p-code instructions to our processors native code.
Like almost everything, this story begins as a challenge fed by curiosity.
I remember going around EFNet, I ran into Mr. Green, who was working with an app compilated in VB5 p-code. He commented me how difficult it was to deal with P-code apps, and we came with the idea of making a P-code specific debugger. In fact it was Mr. Black who said it might be useful. Thinking about it I said this project would not be easy, if one takes into account the scarce to null documentation available on the topic. We searched around, and came back almost empty handed. Later curiosity made me dig even deeper and little by little I realized the project was doable, while not easy . . . I had a chat with Mr. Snow who provided me with a modification of MSVBVM50 made by Lazarus, in which he described all the possible string comparisons that a VB program could perform. This made me think of a solution for the debuggers implementation.
I though it was possible to inject code to MSVBVM50 at run time, the injected code would call the debugger, which would be implemented in a DLL. I decided so and talked to Mr. Snow, who joined the project. He started working on the code injector (known as the Loader) and I on the Debugger DLL, coding the basic skeleton of Debugger DLL loading. When both had something done we tested our invention and to our surprise, it worked ;-). The Debugger had cleared its first step.
We had intercepted the virtual machine and placed our debugger between it and the Application. The most serious problem was solved, although as you will see later that was not the loading method we finally adopted, we progressively improved upon it until we completely avoided the use of a modified virtual machine, but the philosophy remained the same.
* The first Step
Debugger gains access to the virtual machine and takes control.
This was one of the key issues we had to solve to implement
our Debugger: Find out how and when p-code translation was taking
place. Once we knew this, the injected code would take over the
control of the program flow and send the data
to the debugger. The Debugger in turn would process the opcode and return control to the virtual machine. My previous experience in Debugger coding was nil, but not so in Disassembler coding. A short time before I had almost finished an x86
disassembler, so I applied this knowledge to the Debugger implementation. I conceived the following:
To Disassemble/Interpret a piece of code certain elements are necessary:
- A pointer to a buffer containing the data to be translated.
- A routine that reads the opcodes from that buffer and redirects the program flow to the correct opcode interpretation routine.
This task can be performed in one of two fashions: A series of conditional statements (One for each opcode) or by using a Jump table. I discarded the first option, because the high number of different opcodes in p-code would require a huge conditional control structure (Which would be the slowest thing in the world). I guessed that the translation was performed using a jump table containing the addresses to the routines specific to interpret each possible opcode, just as I did with my disassembler. Now I had to accomplish the following things:
- Locate the base address to the buffer that contained the opcodes to be interpreted and the base address of the jump table.
I got to work and compiled a small app in VB like this:
Private Sub Form_Load()
MsgBox "Hello this is P-code!!!", vbInformation, "Example"
I loaded the (MSVBVM60.DLL) virtual machine in SoftICE symbol loader and placed a BPX on _rtcMsgBox. When SoftICE stopped, I pushed F12 to return to the code that called _rtcMsgBox :
call eax // call to rtcMsgBox
cmp edi,esp // we are here
jnz 66105595 // check the stack pointer
xor eax,eax // prepares eax to load the next opcode from the buffer :-)
mov al,[esi] // load the opcode to be executed, in this case 36h
inc esi // increments the pointer in esi
jmp [eax*4+660FDA58] // jump to the routine that interprets the 36h opcode
We see that just as we had deduced, the interpreter reads the opcode from a buffer (ESI) to AL, and jumps to the corresponding interpretation routine using as offset the opcode value, 36h in this example (This is an intelligent and agile way to branch the code without going through thousands of checks, does not even look like M$). If we keep on tracing we see that access to buffer pointed by ESI is continuously repeated. More over, while we are inside the virtual machine, ESI register will always point to the Buffer containing the opcodes. So we can always find out which opcode is going to be executed by using the SoftICE command :
> d *esi
This certainly looked like what we have been looking for: ESI contains a weird pointer to a buffer; AL holds the next byte of the buffer. The most interesting line is the unconditional jump JMP [4*EAX+ADDRESS]. You can see that it uses the byte read from the buffer as an offset to jump to an address in a table using as base ADDRESS. The maximum size of the table can be easily deduced. The maximum size of the offset AL (256) and we are multiplying the index by 4 gives us a length of :
256 * 4 = 1024 bytes
This left me no doubt that this was the table I was looking
for, according to the Microsoft document, the standard set of
P-code contains 256 opcodes. The document also mentioned more
opcodes named extended opcodes. This reminded me of
my dear old PC which also supposedly has 256 different opcodes, but in reality contains many more. But then, how do they do it? If there are only 256 unique values, how are there more opcodes than values? (Again my experience with disassemblers made me realize) Easy, some of the 256 values are reserved as prefixes.
When the interpreter finds one of this prefixes it indicates an extended instruction, given a new set of 256 possible new codes, so the use of prefixes allow an unlimited set of instructions. Remember also that each prefix would have its own new jump table. Later we will see the number of prefixes present in VB p-code and how to locate their jump tables. To confirm that the address we have found was correct, I disassembled the virtual machine and searched all the instances of jump into the table. As one would expect there were many, as many as there were opcodes. Then I analyzed the contents of the table, verifying that its entries were addresses within the virtual machine, all of them contained in the section .ENGINE of the DLL. I went through some of the routines pointed by the table and most of them had the same structure :
They would read the data contained in the buffer pointed by ESI, and execute certain instructions, subsequently read the next opcode and jumped to the corresponding decoding routine. I had found what I was looking for: the address to the jump table for each opcode.
The next step was to substitute the jump table with our own,
which contained the same address for all the opcodes. This address
would be a routine found inside the Debuggers DLL, where
each and every opcode would go through before
being executed. This routine was modified by replacing the C generated call begin and call end frame for our own, which saved all the registers and flags at the beginning and restored them once the debugger gave the green light to the next p-code instruction execution.
Here is the beginning and the end of the Debuggers routine :
__declspec( naked ) void DebuggerProc()
mov VBDebugger.OldStack_ESP,esp // WE save the VB stack
mov VBDebugger.OldStack_EBP,ebp // Base pointer and stack pointer
pushad // Save the state of all the registers
pushfd // and the flags
push ebp // Now we place a standard call frame
mov ebp, esp
sub esp, __LOCAL_SIZE // If we had local variables this would give
// us the amount to subtract form the stack
// Here goes the rest of the Debuggers control code
// Here we modify the jump address, because it cannot
be made at coding time
// so we make self-modifying code that sets the jump address during run-time :P
mov eax,offset JmpOffset
mov esp, ebp // We restore the initial Stack frame
popfd // and the flags
popad // and the registers
// if the opcode was modified with the memory editor, we change it in AL
// so the change is reflected when it jumps to the opcode control routine
// returns control to the virtual machine. Note that this code
// is self-modified at run-time
Looking at this routine we can say, who said that C was not powerful? As you can see, using the Naked directive we can build the routine to our taste. In fact this directive is used often for building drivers (vxd) for windows. This routine acts as a hook between the original code and the Virtual machine. As it is, the routine does nothing more than giving back the control to the original jump table, but it has everything we need to control each and every one of the opcodes executed by the virtual machine.
When we began our research about p-code we realized that the lack of information on the subject was due to the fact that Microsoft keeps the technical specifications of their p-code in secret and only gives them after you sign a so called NDA (non disclosure Agreement), you promise not to release the information under the threat of legal repercussions. So we only had a very poor and superficial document in which Microsoft cursory explained the p-code which you may read here :
Besides Exdec, a P-code disassembler made by Josephco. Exdec produces a disassembly of any p-code file but the opcodes are shown in incomplete form (only the first byte). We will see the reasons for this later.
The address of the jump table provided the base. We prepared
a code patch which was added as a new section to the virtual Machine
DLL. The patch would obtain the initial data and load the Debugger.
Once the Debugger was done preparing the
hooking to the Virtual Machine, the patch would continue with the loading of the VM by redirecting the control to the OEP of the VM DLL. This method had a major drawback: The need to have a modified VM.DLL.
We were able to eliminate the need of the injected code by the creation of a loader, a small application which starts the Visual Basic executable in suspended mode, obtained the entry point with GetThreadContext, and copied there a code patch which loaded the Debuggers DLL. Once the patch was executed, it notified the initiation of the main process by using the synchronism APIs SetEvent and WaitForSingleObject. Once finished the patch would restore the original program code and return to the Original OEP by use of SetThreadContext and the execution would carry on as if nothing had happened.
At that point in time the Debugger already had control, and
a problem occurred while the patch execution was taking place.
As the original Application code and part of the VB data had been temporarily substituted by the graft, and the Debugger needed that data, we had to start the debugger in an independent thread, which would be checking if a position in memory would contain the signature VB5!, which indicated that the loader was finished with the execution of the graft.
The graft would also verify that the executable was a VB application, by looking in the import table for the MSVBMXX.DLL where XX could be 05 or 06.
Another problem was the Address of the opcode table would vary for different versions of the Virtual Machine, so we had to obtain all the different Virtual Machine versions and obtain the correct version for each case. This method was cumbersome because we had to modify the loader every time a new VM version would come out. It was until much later that we devised the ideal solution around this problem. As I said before the opcode address table is found in the .ENGINE section of the VM. One of the properties of this table is that all the addresses contained in it have to point within the same section, so we devised an algorithm that would locate the first set of 256 contiguous DWORDS whose values were contained within the .ENGINE section, and this would determine the Address of the table :-).
* Second Step
Analysis of the opcodes and retrieval of their mnemonics.
This step would have been simpler if from the beginning we
had paid more attention to the symbol files of the virtual machine
(DBG). The way we initially used to obtain the mnemonics was using
the JosephCo Exdec disassembler. No it
was not all that hard. I did not look for the opcodes one by one, but assumed that the list should be present within Exdec, stored within its DLL. It was so, and by using a Hex editor I located and determine, after much research the position of each mnemonic. It was at this point that I discover the opcodes used as prefixes, which are the last five in the set FF FE FD FC FB (Lead0, Lead1, Lead2, Lead3 and Lead4). Each prefix generated a new opcode table given a total of :
( 5 prefixes + 1 standard set) * 256 opcodes = 1536 opcodes
As you can see there are more than a few, many of them are
not used and some are redundant, i.e. execute the same operation.
Lead4 prefix does not use all its opcode table and in the Virtual
Machine version 6 gets up to the opcode 46h
(This can be verified by disassembling the virtual machine). As I told you above when we had located some of the opcodes we realized that the virtual machine debugger files contained all its symbolic information, names of the routines, addresses and names for each one of the mnemonics, so we obtained all by dumping the DBG information into a text file using SoftICE.
For those interested, here is a portion :
RVA Size Symbol name
0F103D8Bh 34 CCyR4
0F103DADh 19 CCyVar
0F103DC0h 9 CBoolCy
0F103DC9h 0 CBoolR8
0F103DC9h 38 CBoolR4
0F103DEFh 32 CStrVar
0F103E0Fh 18 CStrBool
0F103E21h 34 CStrR8
In symbol name you can see the name of some of the p-code mnemonics;
in RVA you have their address within the Virtual machine. Unfortunately,
this address varies with the VM version, but the debugger recovers
it through a heuristic search of the jump table. Some mnemonics
vary in name from one version to the next, but the operation they
perform remains the same. With this info we implemented a primitive
disassembly of the code, which initially only showed the instruction
executed, because we still needed to find the exact size of each instruction. This was the hardest task of all, which entailed analyzing each one of the routines, checking the size of the opcode buffers at the beginning and the end of each routine.
Analyzing more than 1000 routines, although most of them short,
took a long time. Initially we assumed that the sizes were fixed,
but unfortunately was not the case. Some instructions played with
the number of parameters pushed, which made their size variable.
Those were a minority, but enough to send us off track if we were
not careful. Maybe this is why JosephCo
opted to show incomplete disassemblies. It was a Reverse Engineering undertaking in the deepest sense of the word.
We divided the job; each one would derive the sizes of a different set of instructions. After this was done and we had all the sizes we were able to implement a decent disassembly of the code. Needless to say we had to later correct the size of some of the opcodes because unavoidable human errors. Even today we think there might be some erroneous sizes, because some instructions are not used in all applications and it is impossible to test them all. No bad opcodes have been reported in the last version of the debugger (1.3), but previous versions did have them.
* Third step
Adding Debugger basic functionality
This step was a more complex coding and research question. Adding the export table was not very hard, this was performed by the loader by analyzing the PE header, and then the Debugger would obtain the jump table address. Once obtained, we constructed a list where a flag establishes the Breakpoint state (ACTIVE/INACTIVE/NONE). This way we were able to place Breakpoints on any of the Virtual Machine API. A similar method was used to implement Breakpoints on the p-code opcodes. A difference from conventional Debuggers: we could place breakpoints if the debugger finds a given instruction, because the debugger has the control of each opcode before it is transferred to the virtual machine.
Subsequently we added breakpoints inside the actual code, i.e.
given the address of an opcode setup a breakpoint. The Breakpoints
are stored in a dynamically linked list, which has several advantages:
There is no limit in the number of
breakpoints, and the memory usage is adjusted to the number of established breakpoints.
The basic function of the breakpoints is not very complex. Simply, when the Debugger gets control, stores the address of the opcode buffer for its own use. Subsequently this address is shown and compared with the established breakpoints. If the value corresponds to any of the breakpoint in the list, the debugger stops the execution. The memory editor/viewer allows the examination, editing and dumping of any part of the memory belonging to the program being debugged.
We have tried to make the validity of the pointers as optimal and reliable as possible, but even so we cannot rule out access violations, although these should be close to impossible while using the memory editor. It is also possible that memory modification without clear reason may suddenly terminate the execution of the process being traced. This could happen if we replace an instruction with another that does not fit in the context in which the original instruction was being executed, as the debugger will change the opcode, but it is the virtual machine who, in last instance will execute the opcode.
This sort of situations should not happen if one knows what to touch. In future versions, the debugger will control this situation; It will restore the state of the process before the mistake happens allowing the execution to continue in normal fashion. In any case, just as happens with SoftICE, if we assemble wrongful code, the logical outcome is that we mess everything ;-).
Now, all these processes had to be graphically represented in the Debuggers window, so we used a color coded system to identify the lines of code where breakpoints had been established. The graphic implementation is a control List box, drawn by the Debugger Application. As you may have seen, this colour scheme is reminiscent of a well known debugger. We have attempted to maintain the look and feel of the dialog boxes with the looknfeel of the debuggers main window, so they can be quickly identified among the windows of any other application. One can be sure that anything black over green belongs to our debugger.