PDA

View Full Version : vmware's sidt relocation, how?


0rp
April 27th, 2005, 04:18
hi,

ok, this is slightly offtopic, but i'm just wondering, how does vmware relocate the sidt value for guest os'es (=give guests a fake isrtable address) ?

sidt is not a priviledged instruction so it can't handle its exception.

nikolatesla20
April 27th, 2005, 05:52
It can choose to emulate the instruction when it comes thru. That's the benefit of being a vm.

-nt20

0rp
April 27th, 2005, 07:38
Quote:
[Originally Posted by nikolatesla20]It can choose to emulate the instruction when it comes thru.


and how?
nonpriviledged instructions are directly executed on the cpu and sidt does not throw a priv instr exception, thats why vmware never knows that it was executed

or i'm wrong?


btw, vmware does not work like bochs. bochs reads every byte and interprets them, vmware executes the instructions of the guest directly on the cpu and handles only exceptions to ensure the correct behaviour

doug
April 27th, 2005, 10:03
I don't know how VMware works, but some thoughts:

- It could inspect instruction before executing them, and "pass them down" directly to the CPU.

- Let the sidt execute natively, and guard/protect the pages containing the real IDT. Any read/write to the real idt (inside the guest) is translated into the guest (virtual) idt.

From my experience, protections that use the sidt instruction & modify the interrupt vectors do not work very well under VMWare. So I don't think that there's an elegant way of handling it without inspecting every instruction.

Kayaker
April 27th, 2005, 16:53
Hi,

That's an interesting question. I've had some weird results with IDT hooking under VMware as well, but never considered it could be because of communication with the "real" CPU to handle interrupts. Or as doug says they could also be handled strictly internally. If you trace a handler under VM, there doesn't *appear* to be communication outside of the virtual machine itself, but who knows what's hidden.

VMware uses a backdoor via the IN command and a special I/O Port 5658h to interface between the host and the virtual machine. While reported to be used for things like file sharing, drag 'n drop, and updating the real system time, it could possibly be involved in interrupt handling as well. BPIO 5658 in Softice under VM will get you in, I think plenty of it is still undocumented.

One thing VMware doesn't support is HyperThreading. You can use CPUID instructions, or attempt to switch logical processors with SetProcessAffinityMask to prove it, the host's HT is not translated to the virtual machine (so there is only ever one IDT present).
I can't really see an ISR in the host being able to handle very well the exception of a *virtual* interrupt address. It seems there would be 2 entirely different PDE/TLB/PTE mappings of the VM OS (internally), vs where/how the VM as a whole is mapped into the real system. Any changes made to memory addresses must be reflected back to the VM properly. Which I suppose is what you said about ensuring correct behaviour.


Here are searchwords for titles of a couple of articles of interest re the backdoor:

VMware's Back > VMware Backdoor I/O Port
Author: Ken Kato

Detect if your program is running inside a Virtual Machine
@ Codeproject.com


Let us know of any further info.

Cheers,
Kayaker

nikolatesla20
April 27th, 2005, 17:27
I found this with a basic google search

Quote:


----[ 4.1 - Sensitive Register Instructions: SGDT, SLDT and SIDT


In protected mode, all memory accesses pass through either the 'global
descriptor table' (GDT) or 'local descriptor table' (LDT). The GDT and LDT
contain segment descriptors that provide the base address, access rights,
type, length, and usage information for each segment.
The 'interrupt descriptor table' (IDT) is similar to the GDT and LDT, but it
holds gate descriptors that provide access to interrupt and exception handlers.
The GDTR, LDTR and IDTR all contain the linear addresses and sizes of their
respective tables. All three of these instructions (SGDT, SIDT, SLDT) store a
special register value into some location. The SGDT instruction stores the
contents of the GDTR in a 6-byte memory location. The SLDT instruction stores
the segment selector from the LDTR in a 16 or 32-bit general-purpose register
or memory location. The SIDT instruction stores the contents of the IDTR in a
6-byte memory location. These instructions are normally only used by operating
systems but are not privileged in the Intel architecture. Since the Intel
processor only has one LDTR, IDTR, and GDTR, a problem arises when multiple
operating systems try to use the same registers. Although these instructions
do not protect the sensitive registers from reading by unprivileged software,
the processor allows partial protection for these registers by only allowing
tasks at current privilege level (CPL) 0 to load the registers. This means that
if a VM tries to write to one of these registers, a trap will be generated.
The trap allows a VMM to produce the expected result for the VM. However, if
an OS in a VM uses SGDT, SLDT, or SIDT to reference the contents of the IDTR,
LDTR, or GDTR, the register contents that are applicable to the host OS will
be given. This could cause a problem if an operating system of a virtual
machine (VMOS) tries to use these values for its own operations: it might see
the state of a different VMOS executing within a VM running on the same VMM.
Therefore, a VMM like Vmware must provide each VM with its own virtual set of
IDTR, LDTR, and GDTR registers (see [1]).


The scoopy tool takes advantage of this hardware virtualization issue to
identify systems running under VMware. This is accomplished by comparing the
actual values of the GDTR, LDTR and IDTR registers with predefined values.

After doing some tests, the following predefined values showed up for
Windows systems:

* IDT of systems running inside of VMware version 3: 0xFFC6A370
* IDT of systems running inside of VMware version 4: 0xFFC17800
* IDT of a native Windows XP and 2003 Server system: 0x8003F400
* IDT of a native Windows 2000 Server system : 0x80036400

* LDT of systems running inside of VMware version 3: 0x3fa8
* LDT of systems running inside of VMware version 4: 0x4058
* LDT of a native Windows XP and 2003 Server system: 0x0000
* LDT of a native Windows 2000 Server system: 0x0000

* GDT of systems running inside of VMware version 3: 0xFFC05000
* GDT of systems running inside of VMware version 4: 0xFFC07000
* GDT of a native Windows XP and 2003 Server system: 0x8003F000
* GDT of a native Windows 2000 Server system: 0x80036000



ALSO another good link with info about the sidt instruction in VMWARE

hxxp://invisiblethings.org/papers/redpill.html

-nt20

nikolatesla20
April 27th, 2005, 17:34
Oh, and check this out

hxxp://www.cs.nps.navy.mil/people/faculty/irvine/publications/2000/VMM-usenix00-0611.pdf

and check out page 12

-nt20

NeO
April 27th, 2005, 22:35
Thx for post nikolatesla20

very very good thx for sharing,,


bye NeO

0rp
April 28th, 2005, 07:01
i know this paper, and the redpill logic

but it does not explain how the problematic x86 instructions (SGDT, SIDT, SLDT) are handled in tools like vmware or vpc

btw, why are they not priviledged? the values they return are 100% useless for usermode apps

and i dont think that they are inspecting each op before execution. that would need singlestepping and would be way to slow, so vmware does not know at all, what it is executing currently

or i'm completly wrong?

weird

nikolatesla20
April 28th, 2005, 09:02
Well let's think logically for a moment. Since the value of the sidt instruction returned inside VMWare does not match the real value of the idt on the real machine, then somehow somewhere VMWare must be catching the sidt instruction and emulating it. Period. It's just a matter of where and how.

Perhaps I'll fire up NTIce and try to find out what VMWare is doing.....

By the way very interesting thread

-nt20

nikolatesla20
April 28th, 2005, 09:14
ok, another google:

hxxp://64.233.179.104/search?q=cache:X8rMAwWnI00J:appft1.uspto.gov/netacgi/nph-Parser%3FSect1%3DPTO2%26Sect2%3DHITOFF%26u%3D%252Fnetahtml%252Fsearch-adv.html%26r%3D5%26p%3D1%26f%3DG%26l%3D50%26d%3DPG01%26S1%3Dmicrosoft%2524.AS.%26OS%3Dan/microsoft%24%26RS%3DAN/microsoft%24+sidt+instruction&hl=en&client=firefox-a

scroll down to number 76


-nt20

nikolatesla20
April 28th, 2005, 09:19
And one more article I found.

Basically just google like mad

Quote:

MAPPING THE MONITOR'S GDT AND IDT INTO THE GUEST LINEAR SPACE
=============================================================

The monitor is never directly invoked (called) by the guest
code, in fact the guest shouldn't even know about the monitor.
The monitor is only invoked via interrupts and exceptions
which gate via the monitor's IDT. And since entries in the IDT
point into the GDT, we need to look at both, with respect
to where to map them into the current guest task's linear memory.

As our IDT and GDT must occupy the same linear address
domain as the guest code which is normally executing,
we need to make sure there are mechanisms to allow
these structures to cohabitate with the currently running
guest task's address space. And keep in mind, there can be
N different address spaces, depending on which guest
task is currently running.

If we virtualize these structures, we need to maintain both
the guest's copies of them, and modified working copies of
such structures, which are actually used by the processor.
When the guest OS accesses these structures, the
monitor will receive a page fault, since we need to protect
the pages which contain the guest's copy of them. Upon
receiving the interrupt, the monitor can update the working
copy used by the processor, accordingly.

Another point worth noting is that the SGDT and SIDT
instructions are not protected and thus ring3 (user)
code may execute them. They each return a base address
and limit, the base address being a _pure_ linear address
independent of the code and data segment base addresses.
To offer really precise virtualization, in the sense that
the user program will not detect us influencing the
base linear address at which we store these structures,
we could use the 2 following approaches.

Approach #1:

If we are performing the pre-scanning technique, we could
simply virtualize the SGDT and SIDT instructions, and emulate
them to return the values which the guest code expects. In this
case we can place the GDT and IDT structure anywhere in linear
memory such that they are in an area which is not currently used
by either guest-OS or guest-user code. We have access to the
guest page tables, so it is fairly easy to find a free area.

Approach #2:

Under certain circumstances, we may be able to locate the
working copies of the GDT and IDT structures, at the linear
addresses requested by the guest via loads of the GDTR and IDTR
registers. If we can do this, then we may let execution of these
instructions pass through without intervention.

This is a condition that is likely to occur while running
guest application code. It is common for an OS not to allow
access from application code to these structures. Given
this is the case, we can map our working copies into the
current linear address space, right where they are expected
to be.


MAPPING THE ACTUAL MONITOR INTERRUPT HANDLER CODE INTO
THE GUEST LINEAR SPACE
======================================================

Now that we've discussed placing the GDT and IDT in
linear memory, we need to map the actual interrupt handler
code as well. Since we will be virtualizing the IDT and
GDT, the guest OS will not see our segment descriptors
and selectors, so we have some freedom here. We can
place this code (by page mapping it) into an unused
linear address range, again given we have access to the
guest-OS page tables.

The interrupt handler code, is actually just code
linked with our host OS kernel module. The consideration
here is that code generated by the compiler is based on offsets
from the code and data segments. This code will not be calling
functions in the host-OS kernel and should be contained to access
within its own code and data when used in the monitor/guest
context.

So we must set the monitor's code and data segment base addresses
such that the offsets make sense, based on the linear address
where we map in the code. For example, let's say our host-OS
uses a CS segment base normally of 0xc0000000
(like previous Linux kernels) and our kernel module lives
in the range 0xc2000000 .. 0xc200ffff.

Then let's say that based on empty areas in the guest-OS's
page tables, we find a free range living at
0x62000000 .. 0x6200ffff. We would make the descriptor for
our interrupt handler contain a base of 0x60000000, so that
the offsets remain consistent with the kernel module code.

And of course, we mark these pages as supervisor, so that
in the case they are accesses by the guest OS, a fault will
occur. We will also be virtualizing the guest-OS page
tables, protecting that area of memory, so we can update
our strategies. Thus, we will know when the guest-OS makes
updates to it's page tables. This gives us a
perfect opportunity to detect when an area of memory
is no longer free. If the guest-OS marks a linear
address range as not free anymore, and that conflicts
with the range we are using for our monitor code, we can
simply change the segment descriptor base addresses for
code and data, and remap the handler code to another linear
address range which is currently free. No memory transfers
occur, only remapping of addresses.

This kind of overhead will only occur once per time that
we find we are no longer living in free memory. To reduce
this even further, we could start out at, and use alternate
addresses, which are known not to be used by particular guest OSes.




-nt20

0rp
April 28th, 2005, 10:48
Quote:
[Originally Posted by nikolatesla20]Perhaps I'll fire up NTIce and try to find out what VMWare is doing.....

-nt20



yes, pls

nikolatesla20
April 28th, 2005, 13:56
I loaded a small console program that performs an sidt command into Virtual PC 2004 (by M$) and noticed something interesting.

If I run the program on the command line I get an IDT of 0xBDCB6408.

However, if I load OllyDbg into the VM and run the command line program with Olly, and breakpoint on the SIDT instruction, and F8 to step thru it, THEN look at the value I got, it's 0x80036400, which is the IDT of the host system, not the "fake" IDT.

So somehow the debugging made Virtual PC not swap out a fake value. So I'm thinking it still must be some type of exception handler in place to handle the emulation for this instruction.

This is verified by the fact that if I put a breakpoint just AFTER the sidt instruction, it still then gets the fake address.

EDIT: Just tried the same thing in VMWare, and the fake address works all the time correctly. So must be a different method in VMWare compared to Virtual PC.

-nt20

Kayaker
April 29th, 2005, 00:03
Here's a bit of info. I've been able to map out some of the main interface in VMware, which occurs through numerous DeviceIOControl calls between vmware-vmx.exe and vmx86.sys.

What I did first was to use the debug version of vmware-vmx provided. Replace the file
"C:\Program Files\VMware\VMware Workstation\bin\vmware-vmx.exe"
with the version in the \bin-debug\ directory.
Then delete the vmware.log file in your My Virtual Machines directory for the OS you plan on starting.


In the log will be lots of debug output (there is also a small bit of info through regular DbgPrint messages when VMware starts up).
Code:

: vcpu-0| VTHREAD thread 4 "vcpu-0" host id 484 started
: vmx| VTHREAD create thread 4 "vcpu-0"
: vcpu-0| MX: init lock: rank(offline)=89
: vcpu-0| MX: init lock: rank(monitorPoll)=65534
: vmx| DnD rpc already set to 1
: vcpu-0| APIC: Setting up interrupts
: vcpu-0| APIC: version = 0x14, max LVT = 5
: vcpu-0| APIC: LDR = 0x1000000, DFR = 0xffffffff
: vcpu-0| APIC: spurious interrupt reg = 0x11f
: vcpu-0| APIC: thermal interrupt reg = 0x10000
: vcpu-0| APIC: performance counter reg = 0xfe
: vcpu-0| APIC: timer interrupt reg = 0x300fd
: vcpu-0| APIC: local interrupt 0 reg = 0x1001f
: vcpu-0| APIC: local interrupt 1 reg = 0x184ff
: vcpu-0| APIC: local error reg = 0xe3
: vcpu-0| IOAPIC: Setting up interrupts
: vcpu-0| IOAPIC: IRQ 0x2 -> 0xff, timerReg=0x106ff
: vcpu-0| IOAPIC: IRQ 0x8 -> 0xd1, rtcReg=0x8d1
: vcpu-0| IOAPIC: NT/Win2000 using the CMOS clock for timer interrupts



I decided to explore the vcpu-0 thread closer. The Id 484 given is the ThreadID (1E4h). The THREAD command in Softice displays the stack backtrace going through DeviceIoControlFile. What's interesting is that the stack ends on KiUnexpectedInterrupt.
Code:

:thread -x 1e4

Extended Thread Info for thread 1E4
KTEB: 819EDDA0 TID: 1E4 Process: vmware-vmx(5EC)

Start EIP: KERNEL32!GetModuleFileNameA+0179 (77E88785)

Registers: ESI=819EDDA0 EDI=819EDE0C EBX=813587EC EBP=B79C7B54
Restart : EIP=8046A499 a.k.a. ntoskrnl!KiUnexpectedInterrupt+029F

FrameEBP RetEIP Syms Symbol
B79C7B54 EB5C24ED N ntoskrnl!KiUnexpectedInterrupt+029F
B79C7B78 EB5C4D47 N vmx86!.text+222D
B79C7B98 EB5C07E7 N vmx86!.text+4A87
B79C7BFC EB5C0649 N vmx86!.text+0527
B79C7C2C 804B14CC N vmx86!.text+0389
B79C7D00 804A96A0 N ntoskrnl!NtWriteFile+3F74
B79C7D34 80465D49 N ntoskrnl!NtDeviceIoControlFile+0028
0326FF3C 00562433 Y ntdll!ZwDeviceIoControlFile+000B
0326FF64 00445DB4 N vmware-vmx!.text+00161433
0326FF8C 0043A6A5 N vmware-vmx!.text+00044DB4
0326FF9C 0055DD9A N vmware-vmx!.text+000396A5
0326FFB4 77E887DD N vmware-vmx!.text+0015CD9A
0326FFEC 00000000 Y KERNEL32!GetModuleFileNameA+01D1


From the offsets given for vmware-vmx, I was able to determine the exact
DeviceIOControl call in question and the IOCTL code:

push 81013F67h ; dwIoControlCode
push eax ; hDevice
call dseviceIoControl
test eax, eax
...
push offset aIoctl_vmx86_ru ; "IOCTL_VMX86_RUN_VM failed: %s\n"



Now we need to find out where the corresponding dwIoControlCode is handled in the driver vmx86.sys. All roads lead to IRP_MJ_DEVICE_CONTROL of course, and in this case a 2-step jump table calculation:
Code:

:0001064E 55 push ebp
:0001064F 8B EC mov ebp, esp
:00010651 83 EC 48 sub esp, 48h

:00010654 8B 4D 24 mov ecx, [ebp+IOCTL_Code] ; 81013f67h
// dwIoControlCode 81013f67h

:00010657 53 push ebx
:00010658 8B 5D 28 mov ebx, [ebp+returnBuffer]
:0001065B 56 push esi
:0001065C 57 push edi
:0001065D 8B 7D 0C mov edi, [ebp+arg_4]
:00010660 33 D2 xor edx, edx


:00010662 8D 81 B8 C0 FE 7E lea eax, [ecx+7EFEC0B8h] ; 1F
// This instruction yields a calculation in eax:
// (81013f67h + 7EFEC0B8h] = 1F


:00010668 89 13 mov [ebx], edx
:0001066A 89 53 04 mov [ebx+4], edx
:0001066D 8B 77 0C mov esi, [edi+0Ch]
:00010670 3D AB 00 00 00 cmp eax, 0ABh
:00010675 89 75 28 mov [ebp+returnBuffer], esi
:00010678 0F 87 4D 03 00 00 ja IOCTL_not_supported


:0001067E 0F B6 80 8B 0D 01 00 movzx eax, ds:JumpIndex[eax] ; 07
// A hash table returns the jump index
// JumpIndex[1Fh] returns 07
// in this example corresponding to IOCTL_VMX86_RUN_VM

:00010685 FF 24 85 F3 0C 01 00 jmp ds:IOCTL_JumpTable[eax*4]



So from this can be identified all the IOCTL calls of vmx86.sys. One of them is IOCTL_VMX86_INIT, and it is in this one where the lone SIDT instruction in the driver is used. This may be where the host's IDT is first read in.

If you set a BPX on the IOCTL handling code you can monitor the various IOCTL calls coming in.
In tracing IOCTL_VMX86_RUN_VM I was able to reach a page fault handler of some sort, use of SGDT, SLDT, -> indirect call -> SGDT, SIDT, CR3 -> major crash probably from manually tracing too far...

I think fully naming the IOCTL calls through the JumpTable will be useful, time for an idc script I guess..

Kayaker

nikolatesla20
April 29th, 2005, 07:16
Speaking of that Kayaker, I've always had a hard time trying to map IOCTL calls in a driver. I just get lost in that jump table and stuff. Do you have any good hints or methods you use to do this?

Guess I need to practice a bit more.

-nt20

0rp
April 29th, 2005, 08:11
http://pdos.csail.mit.edu/papers/exo:coffing-meng.ps

"All entry and exit to and from
the guest is through xok, which is able to set and reset IDTR."

this could work

Kayaker
April 29th, 2005, 16:47
Yeah nt20, I guess it's just a matter of following the IRP_MJ_DEVICE_CONTROL routine. You can usually define its location in DriverEntry (INIT section) by a common pattern, as the DRIVER_OBJECT is being loaded with the addresses of the IRP_MJ_Xxx routines. Can also be found by the DRIVER <drivername> command in Sice.


Commonly you'll get something like this in DriverEntry, where ESI is the DRIVER_OBJECT. The offset [esi+38] corresponds to the first of the IRP_MJ_xx requests, up to a maximum of IRP_MJ_MAXIMUM_FUNCTION. The addresses may be individually defined, or as in this case, all pointing to a further DispatchControl proc. The DispatchControl proc will likely contain a switch statement for the IRP_MJ requests where you try to pick out the IRP_MJ_DEVICE_CONTROL jump or call, if it isn't explicitly defined right off. The switch statement will relate to the enum order of the IRP_MJ_xx requests, IRP_MJ_CREATE I believe is 0, IRP_MJ_DEVICE_CONTROL will be 0Ch.

Code:

INIT:000106EA mov eax, offset DispatchControl

INIT:000106F1 mov [esi+38h], eax // IRP_MJ_CREATE
INIT:000106F4 mov [esi+40h], eax // IRP_MJ_CLOSE
INIT:000106F7 mov [esi+70h], eax // IRP_MJ_DEVICE_CONTROL

INIT:0001070B mov dword ptr [esi+34h], offset DriverUnload


Which programatically looks like:

//------------------------------------------------
// Set up dispatch routine entry points for IRP_MJ_Xxx requests

// IRP's sent by app when opening and closing a handle to the driver.
pDriverObject->MajorFunction[IRP_MJ_CREATE] = DispatchControl;
pDriverObject->MajorFunction[IRP_MJ_CLOSE] = DispatchControl;

// Dispatch routine for DeviceIoControl calls from the app
pDriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = DispatchControl;

// Define the DriverUnload procedure
pDriverObject->DriverUnload = DriverUnload;


The IRP_MJ_DEVICE_CONTROL routine may have a further switch statement to sort out the IOCTL codes. The values will be same as the dwIoControlCode pushed onto the corresponding DeviceIOControl call. Live tracing IRP_MJ_DEVICE_CONTROL would be the best way to follow what happens (assuming a call is triggered), but even from a disassembly, well the IOCTL code variable has to be used, early and probably off the stack, so it should be recognizable as such with a bit of guesswork.

The driver vmx86.sys way of doing it pretty much follows the pattern, except for the interesting twist at the end of using a magic number with an indexed jump table. That lea statement is kind of a nice trick

Kayaker