I had started live tracing this piece of malware when I realized it was a prime candidate for some IDA idc scripting. There are a few things about the program which makes static analysis difficult.
- It's encrypted. It's a rather simple encryption however which is easily scripted out.
- The code is primarily executed in the .data section and there are many inline character strings and non-standard code instructions which prevent IDA from getting an accurate disassembly.
- Variable pointers and import calls are referenced as EBP offsets, so IDA can't recognize absolute addresses to create the proper Xrefs, autogeneration and all the other wonderful analysis it normally performs.
- Imports are determined dynamically through GetProcAddress, so until we define them IDA can't recognize them.
Let's address each of these problems through a bit of idc scripting and static analysis to supplement our live tracing. In Part 2 I'll mention a few points about the viral code itself.
I've included several files in the attachment, the idc scripts, a header file prototyping some functions and structures not included in the internal IDA definitions. As well there is as an IDB file in IDA 4.9 freeware version format which is fully commented with all functions and variables defined (or at least named, this is meant to be a "working" disassembly for further analysis, not necessarily a definitive treatise). The IDB file can be opened in any IDA version 4.9 and above.
Also included is of course the virus, or else what fun would this be? The win32_virut.exe file has been renamed with a .VXE extension and zip password protected with the password malware. It is quite an infectious file, but it readily detects a normal virtual machine sandbox and won't infect under those conditions. You actually have to force the code to decrypt its payload under a VM. Also, the remote site it tries to connect to has been closed for Terms Of Agreement violations (ya think?), so no live connection is ever made.
You may find it easier to read the idc scripts by downloading the originals or by reading this post in the Blogs Forum (follow the link under 'Post or View Comments' at the bottom of this blog).
A brief description of the virus family from
http://www.bitdefender.com/VIRUS-1000163-en--Win32.Virtob
This virus is a polymorphic, memory-resident file-infector, with backdoor behaviour. Once executed, it injects itself into WINLOGON, creates a new thread in that process, and passes the execution control to the host file.
It also hooks the following functions in each running process (in NTDLL module):
NtCreateFile, NtOpenFile, NtCreateProcess, NtCreateProcessEx
so that every time an infected process calls one of these functions, the execution is passed to the virus, which infects the accessed file, and then returns the control to the original function.
It infects EXE and SCR files, using different infection techniques:
Appending to the last section of the victim, and setting the Entry Point directly to the viral code.
(our variant)
The virus is able to avoid emulators and virtual machines. To ensure there's only one instance of it running in the system, it creates an event with one of the following names:
VT_3, VT_4, VevT, Vx_4
It tries to connect to some IRC server, and join a certain channel. Once it joins the channel, it waits for commands that instruct it to download several files from Internet, and then execute them. The IRC server can be:
proxim.ntkrnlpa.info
(our variant, site no longer active)
Much of what is written above can be figured out by live tracing the malware and eventually letting it infect our sandbox. But let's see what damage we can do to it before it does damage to us..
Step 1: Decrypt
Here is the Entry Point of the virus, which is in the .data section:
Code:
:00403200 cld
:00403201 call loc_40322E
:00403206 push ebx
Take note that the return address of the Call pushed onto the stack will be 0x403206. Trace into the Call and after a bit of preliminary code we reach here:
Code:
:00403257 mov ebp, [esp+4]
// call return of 0x403206 placed in ebp
:0040325B sub dword ptr [esp+4], 21E9h
// new return address of (0x403206 - 0x21E9) = 0x40101D placed on stack
...
:0040326A sub ebp, 301006h
// ebp offset becomes (0x403206 - 0x301006) = 0x102200
:00403270 lea eax, [ebp+301082h]
// eax = (0x102200 + 0x301082) = 0x403282
// this is the starting address of the encrypted code
:00403276 mov dx, [eax-65h]
// word pointer at 0x40321D is a decryption seed value: db 8Eh, 0C8h
:0040327D call sub_403206 // Decryption routine
// the code from here on down is all encrypted
:00403282 db 65h
:00403282 enter 0BDDh, 0C1h
:00403287 push ss
The fact that none of the code from address 0x403282 onwards doesn't make much sense indicates that Call sub_403206 is a decryption routine. Let's take a look at that call:
Code:
:00403206 Decrypt proc near ; CODE XREF: :0040327D
:00403206 push ebx
:00403207 mov ecx, 0DA5h
:0040320C mov ebx, edx
:0040320E
:0040320E loc_40320E: ; CODE XREF: Decrypt+13
:0040320E xor [eax], dx
:00403211 lea eax, [eax+2]
:00403214 xchg dl, dh
:00403216 lea edx, [ebx+edx]
:00403219 loop loc_40320E
:0040321B pop ebx
:0040321C retn
:0040321C Decrypt endp
:0040321C
:0040321C ; ---------------------------------------------------------------
:0040321D Initial_Decrypt_Seed db 8Eh, 0C8h
:0040321F; ----------------------------------------------------------------
A simple XOR loop decryption where the xor value is modified on each iteration by the XCHG instruction. ECX is a counter decremented by the LOOP opcode. The initial decryption seed value is the db 8Eh, 0C8h we discovered above.
Armed with this small bit of analysis we can create the following idc script for decrypting.
PHP Code:
#include <idc.idc>
// Step 1: idc to decrypt section between .data:0x403282 and .data:0x404DCC
// performs the equivalent asm function (xchg dl, dh)
#define bswap16(x) \
((((x) & 0xff00) >> 8) | \
(((x) & 0x00ff) << 8))
static main()
{
auto startdecrypt, size, enddecrypt, seed, ea, decryptword, x;
// starting values determined from decrypt function
startdecrypt = 0x403282;
size = 0x0DA5 * 2; // word size replacement
enddecrypt = (startdecrypt + size); // = 0x404DCC
seed = 0xC88E;
ea = startdecrypt;
decryptword = seed;
Message("\nDecrypting... \n");
while (ea < enddecrypt)
{
// (xor [eax], dx)
x = Word(ea); // fetch the word
x = (x ^ decryptword); // decrypt it
PatchWord(ea, x); // put it back
decryptword = bswap16(decryptword); // xchg dl, dh
decryptword = decryptword + seed; // lea edx, [ebx+edx]
ea = ea + 2;
}
// Let's try to get IDA to reanalyze the code
MakeUnknown (startdecrypt, size, 1);
AnalyzeArea (startdecrypt, enddecrypt);
Message("...Done \n");
}
After running this script you MUST go through the decrypted section and manually resolve the embedded string pointers with the IDA A(scii) command and any unrecognized or incorrect disassembly with the C(ode) command. This is a necessary step for the subsequent IDC scripts to work properly!
You will find a lot of things like the following, which you need to make sure is correctly resolved. By itself IDA won't properly disassemble the code.
Code:
:004032D8 E8 0D call loc_4032EA
:004032D8 ; --------------------------------------------
:004032DD 47 65+ aGetlasterror db 'GetLastError',0 // LPCSTR lpProcName
:004032EA ; --------------------------------------------
:004032EA
:004032EA loc_4032EA: ; CODE XREF: :004032D8
:004032EA 03 F3 add esi, ebx
:004032EC 53 push ebx // HMODULE hModule
:004032ED FF D6 call esi // GetProcAddress
Notice the neat little trick in the above code of how the second parameter of GetProcAddress is automatically pushed onto the stack by effectively being the return address of Call loc_4032EA, which jumps over the string. This type of thing is repeated throughout the program.
Chances are you won't get every bit of disassembly and ascii string identified correctly the first time through a manual fixup, but after applying
...