Kayaker

IDC scripting a Win32.Virut variant - Part 1

Rating: 4 votes, 2.50 average.
I had started live tracing this piece of malware when I realized it was a prime candidate for some IDA idc scripting. There are a few things about the program which makes static analysis difficult.

  1. It's encrypted. It's a rather simple encryption however which is easily scripted out.
  2. The code is primarily executed in the .data section and there are many inline character strings and non-standard code instructions which prevent IDA from getting an accurate disassembly.
  3. Variable pointers and import calls are referenced as EBP offsets, so IDA can't recognize absolute addresses to create the proper Xrefs, autogeneration and all the other wonderful analysis it normally performs.
  4. Imports are determined dynamically through GetProcAddress, so until we define them IDA can't recognize them.


Let's address each of these problems through a bit of idc scripting and static analysis to supplement our live tracing. In Part 2 I'll mention a few points about the viral code itself.

I've included several files in the attachment, the idc scripts, a header file prototyping some functions and structures not included in the internal IDA definitions. As well there is as an IDB file in IDA 4.9 freeware version format which is fully commented with all functions and variables defined (or at least named, this is meant to be a "working" disassembly for further analysis, not necessarily a definitive treatise). The IDB file can be opened in any IDA version 4.9 and above.

Also included is of course the virus, or else what fun would this be? The win32_virut.exe file has been renamed with a .VXE extension and zip password protected with the password malware. It is quite an infectious file, but it readily detects a normal virtual machine sandbox and won't infect under those conditions. You actually have to force the code to decrypt its payload under a VM. Also, the remote site it tries to connect to has been closed for Terms Of Agreement violations (ya think?), so no live connection is ever made.

You may find it easier to read the idc scripts by downloading the originals or by reading this post in the Blogs Forum (follow the link under 'Post or View Comments' at the bottom of this blog).


A brief description of the virus family from
http://www.bitdefender.com/VIRUS-1000163-en--Win32.Virtob

This virus is a polymorphic, memory-resident file-infector, with backdoor behaviour. Once executed, it injects itself into WINLOGON, creates a new thread in that process, and passes the execution control to the host file.

It also hooks the following functions in each running process (in NTDLL module):
NtCreateFile, NtOpenFile, NtCreateProcess, NtCreateProcessEx
so that every time an infected process calls one of these functions, the execution is passed to the virus, which infects the accessed file, and then returns the control to the original function.

It infects EXE and SCR files, using different infection techniques:
Appending to the last section of the victim, and setting the Entry Point directly to the viral code. (our variant)

The virus is able to avoid emulators and virtual machines. To ensure there's only one instance of it running in the system, it creates an event with one of the following names:
VT_3, VT_4, VevT, Vx_4

It tries to connect to some IRC server, and join a certain channel. Once it joins the channel, it waits for commands that instruct it to download several files from Internet, and then execute them. The IRC server can be:
proxim.ntkrnlpa.info (our variant, site no longer active)
Much of what is written above can be figured out by live tracing the malware and eventually letting it infect our sandbox. But let's see what damage we can do to it before it does damage to us..


Step 1: Decrypt

Here is the Entry Point of the virus, which is in the .data section:
Code:
:00403200                 cld
:00403201                 call    loc_40322E
:00403206                 push    ebx
Take note that the return address of the Call pushed onto the stack will be 0x403206. Trace into the Call and after a bit of preliminary code we reach here:

Code:
:00403257  mov     ebp, [esp+4]
    // call return of 0x403206 placed in ebp
 
:0040325B  sub     dword ptr [esp+4], 21E9h
    // new return address of (0x403206 - 0x21E9) = 0x40101D placed on stack
...
:0040326A  sub     ebp, 301006h
// ebp offset becomes (0x403206 - 0x301006) = 0x102200
:00403270  lea     eax, [ebp+301082h]
// eax = (0x102200 + 0x301082) = 0x403282
// this is the starting address of the encrypted code
:00403276  mov     dx, [eax-65h]
// word pointer at 0x40321D is a decryption seed value:  db 8Eh, 0C8h
:0040327D  call    sub_403206      // Decryption routine
// the code from here on down is all encrypted
:00403282  db      65h
:00403282  enter   0BDDh, 0C1h
:00403287  push    ss
The fact that none of the code from address 0x403282 onwards doesn't make much sense indicates that Call sub_403206 is a decryption routine. Let's take a look at that call:

Code:
:00403206 Decrypt         proc near               ; CODE XREF: :0040327D
:00403206                 push    ebx
:00403207                 mov     ecx, 0DA5h
:0040320C                 mov     ebx, edx
:0040320E
:0040320E loc_40320E:                             ; CODE XREF: Decrypt+13
:0040320E                 xor     [eax], dx
:00403211                 lea     eax, [eax+2]
:00403214                 xchg    dl, dh
:00403216                 lea     edx, [ebx+edx]
:00403219                 loop    loc_40320E
:0040321B                 pop     ebx
:0040321C                 retn
:0040321C Decrypt         endp
:0040321C
:0040321C ; ---------------------------------------------------------------
:0040321D Initial_Decrypt_Seed db 8Eh, 0C8h
:0040321F; ----------------------------------------------------------------
A simple XOR loop decryption where the xor value is modified on each iteration by the XCHG instruction. ECX is a counter decremented by the LOOP opcode. The initial decryption seed value is the db 8Eh, 0C8h we discovered above.

Armed with this small bit of analysis we can create the following idc script for decrypting.

PHP Code:
#include <idc.idc>
// Step 1: idc to decrypt section between .data:0x403282 and .data:0x404DCC    
 
// performs the equivalent asm function (xchg dl, dh)
#define    bswap16(x)                    \
    
((((x) & 0xff00) >> 8) |        \
     (((
x) & 0x00ff) << 8))
 
static 
main()
{
    
auto startdecryptsizeenddecryptseedeadecryptwordx;
 
 
    
// starting values determined from decrypt function
 
    
startdecrypt 0x403282;
    
size 0x0DA5 2;                          // word size replacement
    
enddecrypt = (startdecrypt size);         // = 0x404DCC
    
seed 0xC88E;
 
    
ea startdecrypt;
    
decryptword seed;
 
    
Message("\nDecrypting... \n");
    while (
ea enddecrypt)
    {
                                                
// (xor [eax], dx)
        
Word(ea);                               // fetch the word
        
= (decryptword);                      // decrypt it
        
PatchWord(eax);                           // put it back            
        
decryptword bswap16(decryptword);     // xchg dl, dh
        
decryptword decryptword seed;       // lea edx, [ebx+edx]
 
        
ea ea 2;    
    }
 
    
// Let's try to get IDA to reanalyze the code
    
MakeUnknown (startdecryptsize1);
    
AnalyzeArea (startdecryptenddecrypt);
    
Message("...Done \n");

After running this script you MUST go through the decrypted section and manually resolve the embedded string pointers with the IDA A(scii) command and any unrecognized or incorrect disassembly with the C(ode) command. This is a necessary step for the subsequent IDC scripts to work properly!

You will find a lot of things like the following, which you need to make sure is correctly resolved. By itself IDA won't properly disassemble the code.

Code:
:004032D8 E8 0D     call    loc_4032EA
:004032D8                   ; --------------------------------------------
:004032DD 47 65+ aGetlasterror   db 'GetLastError',0 // LPCSTR lpProcName
:004032EA                   ; --------------------------------------------
:004032EA
:004032EA        loc_4032EA:       ; CODE XREF: :004032D8
:004032EA 03 F3     add     esi, ebx
:004032EC 53        push    ebx     // HMODULE hModule
:004032ED FF D6     call    esi     // GetProcAddress
Notice the neat little trick in the above code of how the second parameter of GetProcAddress is automatically pushed onto the stack by effectively being the return address of Call loc_4032EA, which jumps over the string. This type of thing is repeated throughout the program.

Chances are you won't get every bit of disassembly and ascii string identified correctly the first time through a manual fixup, but after applying the subsequent idc scripts those problem areas should be identified and you can go back and correct them before running the scripts again. You'll find odd things such as wsprintf format strings, non-null terminated string blocks with xrefs to parts of them, call instructions where the offset displacement is dynamically calculated, etc.


Step 2: Resolve EBP offsets to real addresses

You'll notice after decrypting the file that variable pointers and import calls are in the form of [ebp+30xxxx]. We've already determined above that EBP = 0x102200, so we simply need to calculate the real address used and replace the operand.

Rather than just replacing the operand text itself with the calculated real address, say by using the idc command
string AltOp (long ea,long n); // get manually entered operand

we will actually patch in the proper displacement in the hex bytes with
PatchDword (long ea,long value);

After handling each affected instruction we need to undefine it with
MakeUnkn (long ea, long expand);

and have IDA reanalyze with
AnalyzeArea (long sEA,long eEA);

The operands should be converted to a real offset and the proper xrefs resolved for each instruction.

We also use the idc commands

long GetOperandValue (long ea,long n); // get instruction operand value
string GetOpnd (long ea,long n); // get instruction operand

i.e. for the instruction
mov [ebp+302BD5h], eax

GetOpnd (ea,0); returns the string "[ebp+302BD5h]"
GetOperandValue (ea,0); returns 0x00302BD5

After patching in the real address and reanalyzing the code the operand will be rewritten with an "ss:" prefix and/or "[ebp]" suffix.

i.e. the previous example will resolve to:
ss:dword_404DD5[ebp]

We don't really want that so we can remove those string components by parsing them out. That will be the job of the next idc script. That step could be added here, but for demonstration purposes I keep them separate.


AnalyzeArea might not resolve all the instructions properly the first time through, so a second pass is necessary to convert any instructions that are still in the form of [ebp+xxxxxxxx]. This usually occured where we(I) didn't make the proper manual corrections to the disassembly or inline ascii strings after running the decryption idc script. We can use the GetFlags(long ea) command to get the internal flags for the operand definition and deal with each type individually.

Any problem operands remaining will be pointed out the by the idc script, and will also be highlighted in red by IDA. These should be handled manually. For example, the virus may create a dynamically determined call offset or otherwise change an instruction. IDA resolves these as Xrefs into the middle of an instruction, but doesn't quite get the syntax right when running AnalyzeArea through the idc script. However, if you right click on the errant operand you will probably find a more accurate selection.

Enough of the preamble, I just wanted to touch on a few points of using these idc commands.
Here's the second script:

PHP Code:
#include <idc.idc>
//  Step 2: idc to resolve EBP offsets to real addresses
 
 
static resolve_offsets(ean)
{    
    
auto OpValrealaddresspatchoffseti;
 
    
OpVal GetOperandValue(ean);
 
    if (
OpVal 0x400000)
    {
        return;     
// we've already converted this operand            
    
}
        
// calculate the real address
       
realaddress GetOperandValue(ean) + 0x102200;
 
    
// calculate the offset where the operand begins in the instruction
    
for (0ItemSize(ea) - 3i++)
    {
        if (
Dword(ea i) == OpVal)
        {
            
// Pattern found            
            
patchoffset = (ea i);
        }
    }
 
    
// patch in the real displacement
    
PatchDword(patchoffsetrealaddress);
 
    
// undefine the instruction so it will be reanalyzed fresh later
    
MakeUnkn (ea0);
}
 
static 
main()
{  
    
auto starteaendeaeannexteaOpValuFlagscount1count2count3;
 
    
startea 0x403270;         // first occurence of [ebp+30xxxx] offset
    
endea 0x404DCC;           // determined from idc in Step 1
 
 
    // Use some counters to check that all operands were handled properly.
    // Remaining errors likely mean we didn't make the correct analysis
    // after running the decrypt script in Step 1.  
    // Go back, correct those instructions and rerun this script.
 
    
count1 0;
    
count2 0;
    
count3 0;
 
 
 
    
/////////////////////////////////////////////////////////////////////////
 
    // Step 1:
    // Convert operands of the form "[ebp+30xxxxh]" to a real offset
 
    /////////////////////////////////////////////////////////////////////////
    
ea startea;    
 
    
Message("\nConverting EBP offset operands to real addresses \n");
 
    while (
ea != BADADDR)
    {
 
        
// calculate next instruction pointer before we modify anything        
        
nextea NextHead(eaendea);
 
 
        
// check both the first(0) and second(1) operand of the instruction
        
for (n=0n<2n++)
        {
 
            
// for all instructions with an offset in the form of "[ebp+"
            
if( strstrGetOpnd (ean), "[ebp+" ) != -)
            {
 
                
count1 count1 1;  
                
resolve_offsets(ean);
 
            }
        }        
 
        
ea nextea;           // next instruction        
 
    
}
 
    
// Reanalyze
    
AnalyzeArea (starteaendea);
 
 
 
 
    
/////////////////////////////////////////////////////////////////////////
 
    // Step 2:
    // Make a second pass at autoanalysing operands
    // still in the form of "[ebp+"
 
    /////////////////////////////////////////////////////////////////////////
 
    
ea startea;
 
    
Message("Running a second pass at autoanalysis \n");
    while (
ea != BADADDR)
    {
 
        
nextea NextHead(eaendea);        
 
        for (
n=0n<2n++)
        {
 
            
// for all instructions with an offset in the form of "[ebp+"
            
if( strstrGetOpnd (ean), "[ebp+" ) != -)
            {                
                
count2 count2 1;
 
                
// Get operand value                
                
OpVal GetOperandValue(ean);
 
                
// Get value of internal flags to see how IDA
                // has defined the operand to this point                
                
uFlags GetFlags(OpVal);
 
                if(
isData(uFlags))
                {
 
                    
// If operand offset is already defined as 'data'
                    // then we only need to reanalyze the instruction
                    // to get IDA to resolve the xref
 
                    // undefine the instruction so it will be reanalyzed fresh
                    
MakeUnkn (ea0);
 
 
                } else
 
                if(
isUnknown(uFlags))
                {  
 
                    
// If operand offset is defined as 'unknown', create
                    // a data xref at the operand address and reanalyze    
                    
add_dref(eaOpValXREF_USER dr_O);                    
                    
MakeUnkn (ea0);
 
 
                } else {
 
                    
// GetFlags(OpVal) indicates that what is left over is
                    // defined as 'isTail'. Undefine both the operand address
                    // and the calling instruction and let IDA reanalyze
                    
MakeUnkn (OpVal0);
                    
MakeUnkn (ea0);
                }                    
            }
        }        
 
        
ea nextea;
 
    }
 
    
// Reanalyze
    
AnalyzeArea (starteaendea);
 
 
    
/////////////////////////////////////////////////////////////////////////
 
    // Step 3:
    // Finally, let's inform ourselves of which instructions are still
    // in the form of "[ebp+" and should be checked manually.
    // The offsets will be highlighted in red by IDA as well.
 
    /////////////////////////////////////////////////////////////////////////
 
    
ea startea;
    
Message("The following instructions (if any) are still in error and \
            should be fixed manually before rerunning this script \n"
);
 
    while (
ea != BADADDR)
    {
 
        
nextea NextHead(eaendea);        
 
        for (
n=0n<2n++)
        {
 
            
// for all instructions with offset *still*
            // in the form of "[ebp+"
            
if( strstrGetOpnd (ean), "[ebp+" ) != -)
            {
 
                
count3 count3 1;
                
Message("%d  0x%08X  %s \n"count3eaGetOpnd (ean));
 
            }
        }        
 
        
ea nextea;
 
    }  
 
 
    
Message("\n%d / %d operands analysed correctly on first pass \n", \
            
count1-count2count1);
    
Message("%d / %d operands corrected on second pass \n", \
            
count2-count3count1);
 
 
    
/////////////////////////////////////////////////////////////////////////
 
    
Message("...Done \n");


Step 3: Parse out unwanted operand text

PHP Code:
#include <idc.idc>
//  Step 3: idc to parse out unwanted text
//  from an operand such as "ss:dword_404DD5[ebp]"
 
 
static clean_text(ean)
{
 
    
auto OldOpStrTempOpStrNewOpStrposbeforestrafterstr;
 
    
beforestr 0;
    
afterstr 0;
    
OldOpStr GetOpnd (ean);
 
    
// find position of "ss:" if present and remove it        
 
    
pos strstr(OldOpStr"ss:");        
 
    if(
pos != -1)   // contains substring    
    
{            
        
beforestr substr(OldOpStr0pos);
        
afterstr substr(OldOpStrpos+3, -1);
 
        
// combine string parts without "ss:"
        
TempOpStr beforestr afterstr;
 
 
    } else {
 
        
TempOpStr OldOpStr;
 
    }
 
 
    
// find position of "[ebp]" if present and remove it        
    
pos strstr(TempOpStr"[ebp]");
 
    if(
pos != -1)    
    {
 
        
beforestr substr(TempOpStr0pos);
        
afterstr substr(TempOpStrpos+5, -1);
 
        
// combine string parts without "[ebp]"
        
NewOpStr beforestr afterstr;            
 
        
OpAlt(eanNewOpStr);      // replace the operand
 
    
}                
}
 
 
static 
main()
{
    
auto starteaendeaean;
 
 
    
startea 0x403270;         // first occurence of [ebp+30xxxx] offset
    
endea 0x404DCC;           // determined from idc in Step 1
 
    
ea startea;
    
Message("\nCleaning up operand syntax... \n");
 
    while (
ea != BADADDR)
    {
 
        
// check both the first(0) and second(1) operand of the instruction
        
for (n=0n<2n++)
        {
 
            
// for all instructions where we find "ss:" or "[ebp]"
            
if( strstrGetOpnd (ean), "ss:" ) != -1  ||
                
strstrGetOpnd (ean), "[ebp]" ) != -)
            {
 
                
clean_text(ean);  
 
            }
        }
 
        
ea NextHead(eaendea);           // next instruction
 
    
}
 
 
    
Message("...Done \n");


Step 4: Resolve API names


Immediately after the code is decrypted by the program it retrieves the offset of GetProcAddress by finding the base of kernel32.dll and parsing through its export table. All other import addresses, including those it hooks from ntdll.dll, are obtained by using GetProcAddress.

The import names it wants are in an ascii table and a simple routine is used for each dll it searches for import addresses.
For example,
Code:
:00403369                 lea     ESI, aLstrcat   ; "lstrcat"
:0040336F                 xor     ecx, ecx
:00403371                 lea     EDI, dword_404DE9
:00403377                 mov     CL, 24h
:00403379                 call    GetProcAddress_Routine
ESI is the start of the API name table, which begins here:

Code:
:004037B3 aLstrcat        db 'lstrcat',0          ; DATA XREF: :00403369t
:004037BB aLstrlen        db 'lstrlen',0
:004037C3 aCreatefilea    db 'CreateFileA',0
:004037CF aCreatefilemapp db 'CreateFileMappingA',0
...
EDI is the start of the table where it places the API addresses.
ECX (CL) contains the number of import names to find for this particular dll.
The Call is a simple LOOP which calls GetProcAddress for each import and stores their offsets.


Having cleaned up the disassembly with the first 3 idc scripts we can easily find where this GetProcAddress routine is cross referenced in the file and get the necessary values for each of the 5 dlls in order to resolve API names with the following script:

PHP Code:
#include <idc.idc>
//  Step 4: idc to resolve import calls and enter their name
 
static patchapi(apinametableapiaddresstablenumapis)
{
    while (
numapis != 0)
    {
 
        if (!
MakeNameEx(apiaddresstable,GetString(apinametable, -1, \
                                                  
ASCSTR_C),SN_AUTO))
        {
 
            
// we will get an error because LoadLibraryA is already defined
            // rename as LoadLibraryA_0
 
            
Message("API name already in use, renaming as %s \n", \
                    
GetString(apinametable, -1ASCSTR_C)+"_0");
 
            
MakeNameEx(apiaddresstable, \
                    
GetString(apinametable, -1ASCSTR_C)+"_0",SN_AUTO);
 
        }
 
        
apinametable NextHead(apinametableBADADDR);
        
apiaddresstable apiaddresstable+4;
        
numapis numapis 1;
 
    }
}
 
static 
main()
{
    
Message("\nResolving API names... \n");
 
    
patchapi(0x4037B30x404DE90x24);
    
patchapi(0x4039BE0x404E790x0D);
    
patchapi(0x403B5F0x404EDD0x04);
    
patchapi(0x403AB60x404EAD0x07);
    
patchapi(0x403AF40x404EC90x05);
 
    
Message("Game over \n");


Step 5: Apply C header file

The last step is to read in the header file, defines.h, with the IDA menu command File/Load file/Parse C header file (Ctrl+F9). This file contains some of the function prototypes and structures not defined by default by IDA, primarily the ntdll imports.

You'll probably notice that the parameter definitions for import calls are not always propagated correctly, some may have them, some may not. There are a few things that may help, that fall into the category of "dealing with IDA quirks".

  • Make sure the code containing the import(s) is within a defined function (Create function).
  • Make sure the function has a proper endpoint, i.e. some of the virus function blocks may end with a JMP (Set function end).
  • Select (Edit function). Don't make any changes, just close the dialog box. This seems to force IDA to reanalyze the function and often redefine and propagate any import parameters correctly.
  • Right click on the import function, undefine and then redefine as Code. Again, this seems to work for some cases.
Once all this "prettying up" of the disassembly is done you can finally get to the fun part of analyzing the program.

The included IDB file has most of the virus functionality in the .data section defined in a general way. The .text section has a small decryption function I didn't bother detailing, it is most easily dealt with under a debugger and is completely safe to let run under a closed sandbox environment.


Again, the idc scripts, IDB file and virus are in the attachment, the exe has been renamed .vxe and zip protected with the password malware

Part 2 of this post will follow.
Attached Thumbnails Attached Files

Submit "IDC scripting a Win32.Virut variant - Part 1" to Digg Submit "IDC scripting a Win32.Virut variant - Part 1" to del.icio.us Submit "IDC scripting a Win32.Virut variant - Part 1" to StumbleUpon Submit "IDC scripting a Win32.Virut variant - Part 1" to Google

Categories
Uncategorized

Comments