Welcome to the new Woodmann RCE Messageboards Regroupment
Please be patient while the rest of the site is restored.

To all Members of the old RCE Forums:
In order to log in, it will be necessary to reset your forum login password ("I forgot my password") using the original email address you registered with. You will be sent an email with a link to reset your password for that member account.

The old vBulletin forum was converted to phpBB format, requiring the passwords to be reset. If this is a problem for some because of a forgotten email address, please feel free to re-register with a new username. We are happy to welcome old and new members back to the forums! Thanks.

All new accounts are manually activated before you can post. Any questions can be PM'ed to Kayaker.

Watermarking by linking order

A classroom run by newbies for newbies. Gain valuable reversing experience & skills as we explain the in's and out's of RCE.
niaren
Member
Posts: 70
Joined: Thu Dec 10, 2009 3:16 pm

Watermarking by linking order

Post by niaren »

Inspired from this thread
http://www.woodmann.com/forum/showthrea ... #post88531

and in particular from the contents of this post
...Others can correct me if I am wrong here but I believe what IDA does on top of what others have said is change the linker order of it's various object files during the linking stage.

For example if the compile process ended up with the following objects

file1.o, file2.o, file3.o

You could change the order they are linked together giving and individualised watermark, now imagine doing that with hundreds of object files that IDA is most likely to have you would have loads of combinations you can use.

And personally I don't think it's an easy task to remove since you would need to move the order of the linked in objects to alter the watermark which means relative addresses within the program would need to be updated.
a mini project is proposed to study how to reverse/defeat/handle this (clever) way of creating a watermark. As is mentioned in the above post it may not be easy to reorder the objects/functions in the executable (.exe/.dll) because addresses then points to wrong locations. It turns out that IDA and its scripting functionality (IDC) may be used to achieve the reordering without having to go make a BIG project. This is a mini-project :)
With IDA and IDC the reordering can be automized which is quite convenient because for applications with many object files it may not be safe to just reorder a subset of the object files. It would be more safe to create a whole new watermark/permutation of all object files.
This mini-project is just as much a project about getting hands-on experience with IDC and having fun :p

In order to get started I have created a toy-application. All the application does is to print two strings.

Code: Select all

main.c

extern void func1object1();
extern void func1object2();

void main()
{
	func1object1();
	func1object2();
}

file1.c

#include <stdio.h>

void func1object1()
{
	printf("Hello from object 1!\n");
}

file2.c


#include <stdio.h>

void func1object2()
{
	printf("Hello from object 2!\n");
}

From these 3 very simple files two applications are built, the only difference being that the linking order of the object files is different. This makefile

Code: Select all

SRCS = main.c file1.c file2.c

OBJS1 = main.obj file1.obj file2.obj
OBJS2 = file2.obj file1.obj main.obj 

CC        = CL
CCFLAGS   = /O2 /Oi /D "_MBCS" /FD /EHsc /MD /Gy /W3 /c /Zi /TC
            

LINK       = link
LINKFLAGS1 = "/OUT:watermark1.exe" "/MANIFESTUAC:level='asInvoker' uiAccess='false'" /OPT:REF /OPT:ICF /DYNAMICBASE /NXCOMPAT /MACHINE:X86 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib 
LINKFLAGS2 = "/OUT:watermark2.exe" "/MANIFESTUAC:level='asInvoker' uiAccess='false'" /OPT:REF /OPT:ICF /DYNAMICBASE /NXCOMPAT /MACHINE:X86 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib 

EC = echo
RM = del

default: all


clean:
	@$(RM) /F *.obj
	@$(RM) /F *.idb
	@$(RM) /F *.pdb
	@$(RM) /F *.exe
	@$(RM) /F *manifest*

%.obj : %.c 
	"C:\Program Files\Microsoft Visual Studio 9.0\VC\bin\vcvars32.bat"
	@$(EC) ************************************************
	@$(EC) * Comiling [email protected]
	$(CC)  $(CCFLAGS) $<

watermark1.exe: $(OBJS1)
	"C:\Program Files\Microsoft Visual Studio 9.0\VC\bin\vcvars32.bat"
	$(LINK) $(LINKFLAGS1) $(OBJS1)
	$(LINK) $(LINKFLAGS2) $(OBJS2)

all: watermark1.exe
creates the two .exe files watermark1.exe and watermark2.exe. Attached a zip file containing all the files.
Maybe not surprisingly, for this example, the order of the objects in the binary corresponds to the order in which they are listed in the linker command. The idea is to create watermark2.exe from watermark1.exe.
I hope this example is not too simple. Maybe it will be much harder with c++ code, have no idea. I'm not sure if it is possible to identify the objects themselves but the functions can be identified (by IDA) and IDC (as far as I understand now) provides functionality for jumping to specified functions or just the next function in the code given som virtual address.

Does this make any sense at all? :)
Attachments
Linkorder.zip
(15 KiB) Downloaded 124 times
User avatar
dELTA
Posts: 4209
Joined: Mon Oct 30, 2000 7:00 am
Location: Ring -1

Post by dELTA »

Nice introduction and starting documentation, I'm looking much forward to see your progress in this project. :yay:

And yes, it makes sense indeed. :)
"Give a man a quote from the FAQ, and he'll ignore it. Print the FAQ, shove it up his ass, kick him in the balls, DDoS his ass and kick/ban him, and the point usually gets through eventually."
niaren
Member
Posts: 70
Joined: Thu Dec 10, 2009 3:16 pm

Post by niaren »

Thanks for the encouragement :)

Just came back to this mini-project after I let myself be interrupted by a crackme (my first .NET reversing) and that crackme was driving me nuts. I had virtually the complete source code (dotfuscated) and I couldn't solve it anyway!? It was quite a frustrating struggle you can imagine :)

Anyway, have just written and run my first IDC script. The script is basically a copy of an example in this book http://www.idabook.com/ p. 268.

The script enumerates the, by IDA, identified functions. The script looks like this:

Code: Select all

#include <idc.idc> // Mandatory include directive

static main()
{
    // Step one, enumerate/list functions
    GetFunctions();	
}

static GetFunctions()
{
    auto addr, name;
    addr = 0;
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        Message("Function: %s at %x\n", name, addr);  
    }
}  
When run on watermark1.exe it produces the following output (before you read on guess how many functions IDA finds? :p ):

Code: Select all

Compiling file 'C:\rce\LinkOrder\linkorder.idc'...
Executing function 'main'...
Function: _main at 401000
Function: sub_401010 at 401010
Function: sub_401020 at 401020
Function: _pre_cpp_init at 40102d
Function: ___tmainCRTStartup at 401078
Function: $LN31 at 4011ee
Function: start at 4012cf
Function: [email protected]@[email protected]@@Z at 4012d9
Function: $LN5 at 40131b
Function: _amsg_exit at 40132a
Function: __onexit at 401330
Function: $LN8 at 4013cc
Function: _atexit at 4013d5
Function: sub_4013EC at 4013ec
Function: sub_401412 at 401412
Function: _XcptFilter at 401438
Function: __ValidateImageBase at 401440
Function: __FindPESection at 401480
Function: __IsNonwritableInCurrentImage at 4014d0
Function: _initterm at 40158e
Function: _initterm_e at 401594
Function: __SEH_prolog4 at 40159c
Function: __SEH_epilog4 at 4015e1
Function: __except_handler4 at 4015f5
Function: __setdefaultprecision at 40161a
Function: sub_401645 at 401645
Function: ___security_init_cookie at 401648
Function: [email protected]@YAXXZ at 4016de
Function: _unlock at 4016e4
Function: __dllonexit at 4016ea
Function: _lock at 4016f0
Function: sub_4016F6 at 4016f6
Function: _except_handler4_common at 401706
Function: _invoke_watson at 40170c
Function: _controlfp_s at 401712
Function: ___report_gsfailure at 401718
Function: _crt_debugger_hook at 40181e
We wrote 3 simple functions but IDA identifies 37! :)
It is not clear, at least not to me at this point, whether these extra functions can be filtered out or neglected for the reordering. At this stage they are neglected. Another thing that is not considered yet is whether the data is part of the watermark. Right now only the functions are considered.

I'm going to read some more to find out which IDA functions that can be used for the reordering of the functions and what data structure supported by IDA can be used for saving the functions into as preparation for the actual reordering.
User avatar
dELTA
Posts: 4209
Joined: Mon Oct 30, 2000 7:00 am
Location: Ring -1

Post by dELTA »

The other functions that were detected are most likely just standard library functions of the compiler/linker. You can see that IDA even identified a majority of them from its standard signatures.

If I were you I'd ignore those in the first stage of this project (some of them could have some quite annoying optimizations that will make trouble at the beginning of a project like this), and first only focus on your own functions (they will most likely be adjacent in the binary, and thus possible to rearrange independently of the library functions).

Btw, at a later stage you should probably take a look at import table reordering too, since this is a very simple and efficient way to watermark an exe file.
"Give a man a quote from the FAQ, and he'll ignore it. Print the FAQ, shove it up his ass, kick him in the balls, DDoS his ass and kick/ban him, and the point usually gets through eventually."
User avatar
Kayaker
Posts: 4179
Joined: Thu Oct 26, 2000 11:00 am

Post by Kayaker »

Boy, doesn't that illustrate the simple beauty of a program coded in ASM? :yay:

I created MAP files of both exe's and compared them with UltraEdit/Text Compare. The only differences recorded were the following:

Code: Select all

watermark1:

 0001:00000000       _main
 0001:00000020       sub_401020
 0002:000000E0       aHelloFromObject2


watermark2:

 0001:00000000       sub_401000
 0001:00000020       _main
 0002:000000E0       aHelloFromObject1

In this "simple" case, we only have to worry about 3 procs, 401000, 401010 and 401020. The middle proc doesn't change, but if we were to swap the 1st and 3rd it could affect it's alignment. In this particular case the number of bytes in the 1st and 3rd proc are the same so we can ignore the middle one, but even this shows how difficult fixing this up would be.

I'm just thinking out loud here.. Let's say one devises a script to swap procs 1 and 3 (having determined that that's the strategy needed) and also fixes up the jump/call relative addresses. But add a small layer of complexity, i.e. say the next time procs 1 and 3 are of *different* byte lengths.. that means we also have to deal with moving/fixing proc 2 as well.

Add a few more 'watermark' functions, different sizes, scattered all over a large amount of code, and now it just gets nasty to contemplate.

I'm curious now how an IDC script to fix the simplest scenarios might fare with a more complex one.

Simple:
swap 2 identified procs of the same size - no functions in between are affected
fix up relative jump/call addresses
done?

Not as simple:
swap 2 identified procs of *different* size - all functions in between are affected
fix up relative jump/call addresses of *all* affected code
done?

Crazy:
swap around many procs of varying sizes, fixing up all affected code
?improbable?

I suppose the other thing too is, understanding how the watermarks are checked. CRC check of only specific watermark functions? Maybe not all the code needs to be handled. Might'nt the watermark-check-code be the weak link in all this if the goal is to "crack" such a protection?


Kayaker
User avatar
dELTA
Posts: 4209
Joined: Mon Oct 30, 2000 7:00 am
Location: Ring -1

Post by dELTA »

Glad to have you in the discussion Kayaker. :)

Kayaker wrote:In this "simple" case, we only have to worry about 3 procs, 401000, 401010 and 401020. The middle proc doesn't change, but if we were to swap the 1st and 3rd it could affect it's alignment. In this particular case the number of bytes in the 1st and 3rd proc are the same so we can ignore the middle one, but even this shows how difficult fixing this up would be.
Yes, my viewpoint from the start has been that you must be prepared to move around all functions in the executable for a procedure like this, exactly because of such alignment problems combined with the fact that very few functions will be of the exact same size, and thus not "switchable in-place".

Kayaker wrote:Add a few more 'watermark' functions, different sizes, scattered all over a large amount of code, and now it just gets nasty to contemplate.
As long as you have generic code to relocate a function to any position, why would it really be so much worse to move them all around than to move just a few? I'm sure the computer won't complain too much about one for loop being iterated a few more times? :) The only possible problem I can think of that increases with the number of simultaneously relocated functions it that there might be functions that are "harder to relocate" (due to crazy compiler optimizations or dynamic address resolutions of different kinds, that IDA therefore won't catch when analyzing/decompiling it). Other than that, am I missing something?

Kayaker wrote:I'm curious now how an IDC script to fix the simplest scenarios might fare with a more complex one.

Simple:
swap 2 identified procs of the same size - no functions in between are affected
fix up relative jump/call addresses
done?

Not as simple:
swap 2 identified procs of the *different* size - all functions in between are affected
fix up relative jump/call addresses of *all* affected code
done?

Crazy:
swap around many procs of varying sizes, fixing up all affected code
?improbable?
Again, as long as the "simple script" doesn't have hardcoded addresses for some special program or something stupid like that, and with my special reservations above, I can't really see the problem, neither coding-complexity wise or execution time-complexity wise? Please, tell me what I'm missing, oh great god of the kayak! :D
Kayaker wrote:I suppose the other thing too is, understanding how the watermarks are checked. CRC check of only specific watermark functions? Maybe not all the code needs to be handled. Might'nt the watermark-check-code be the weak link in all this if the goal is to "crack" such a protection?
First of all, there is one VERY big and important difference between CRC checks and watermarks, which is also exactly what makes watermarks such a pain in the ass. CRC checks are performed by the application itself, and can therefore, just as you say, be easily found, reversed and/or neutralized. The problem with watermarks is that the checking code is contained in a completely separate program, locked into a safe (or ok, most likely in a crappy unpatched Windows server, but anyway ;) ) inside the premises of the software author, only to be taken out and used locally at their office when the same software author finds a leaked/warezed version of their software on the net, in order to be able to subsequently sue the crap out of the person that the watermark reveals to be the source of the leak. Thus, no checking code is available for our analysis (unless you offer to burglarize the the IDA Pro offices and steal it of course, which I'm sure would make you quite popular around lots of people here :D ), and thus, each and every bit of information inside the executable could potentially be part of a secret watermark, cleverly steganographed into functionally important parts of the applications. So, contrary to the common solution for removing a CRC check in a program (patching the check, or in more rare cases reversing the CRC algo and adapting the patch data to result in the same checksum), the only way to "remove" watermarks is to mess up the binary file in each and every way and dimension that you think information might be implicitly stored to form part of the watermark, while still keeping it fully functional, and that's why we're here today! :)

As mentioned in the thread referenced at the top if this thread, there is apparently rumours saying that e.g. IDA Pro uses the linking order of its object files to create one (out of many?) such watermark entropy pieces for IDA Pro copies, and thus, the idea of this mini project was born, and its primary scope of investigating how easy it would be to re-shuffle all the functions in an arbitrary executable, in order to create a generic "crack" for exactly that specific type of watermarking technology.

Future (and probably well-needed in order to reach practical result) steps in the "creation of the ultimate generic watermark defeater tool" would probably be a similar (but comparatively more simple) import table shuffler, export table shuffler, relocation table shuffler, PE resource shuffler, and code-location-independent function and data area diffing tool, which checks for any differences within functions that are not related to their location (and thus neglecting different call and jump addresses inside their code related to that), e.g. to see if there are any differences in used instructions in sub areas of functions, differences in data ordering, or tracking data in PE headers or code caves.

This mini project is both a great first step and a very good mini project though! Well, until you answer my questions above and tell me it's impossible, but anyway. ;)
"Give a man a quote from the FAQ, and he'll ignore it. Print the FAQ, shove it up his ass, kick him in the balls, DDoS his ass and kick/ban him, and the point usually gets through eventually."
User avatar
Kayaker
Posts: 4179
Joined: Thu Oct 26, 2000 11:00 am

Post by Kayaker »

Thanks for clarifying watermarking dELTA. I understood that it was to match a particular compilation to a particular person (so they might get the crap sued out of them as you say), but I was also envisioning it as being used as part of a "normal" protection scheme as well, which I guess doesn't necessarily have to be the case and obviously not part of this project.

i.e. as a particular key file will only work with a particular compilation because the linking order is taken into account. In other words, the linking order fingerprint is embedded in the key file and some algorithm is used with it to verify the integrity of the program. (the CRC check comment was a simplistic example of that idea)

If that's not the case, then what's the benefit of removing such a watermark? If I've got IDA and I'm able to steal YOUR IDA, then I can swap watermarks and YOU get blamed for the release, is that it? :devil:

This mini project is both a great first step and a very good mini project though! Well, until you answer my questions above and tell me it's impossible, but anyway.
No, actually I do have hope, that's why I said "I'm curious now how an IDC script to fix the simplest scenarios might fare with a more complex one."
If you can reorder one function, in theory you should be able to reorder them all. In theory. That's the caveat that still needs to be addressed.


This reminded me of a paper I had posted before
http://www.woodmann.com/forum/showthrea ... sification

Software Security Through Targetted Diversification
http://www.cosic.esat.kuleuven.be/publi ... is-122.pdf

The paper is a thesis which discusses the idea of creating software which is distributed as polymorphised versions, in an effort to discourage automated or generic cracking of it. Specifically it suggests the use of Genetic Algorithm (GA) programming to create a diverse population of software for distribution to the masses.


This suggests GA could also be used to create individualised programs. Change a few parameters, fitness/crossover values, record some unique aspect of the offspring (compiled program), and give it to its adopted parent (registered owner). If you find it outside of its new home (leaked), do a DNA analysis.
niaren
Member
Posts: 70
Joined: Thu Dec 10, 2009 3:16 pm

Post by niaren »

The other functions that were detected are most likely just standard library functions of the compiler/linker. You can see that IDA even identified a majority of them from its standard signatures.
I was thinking the same thing. That they are appended and as such appear last in image but this is just an assumption for now :)

Kayaker, did you create those MAP files in IDA? (File->Produce File->Create MAP file...) I didn't think of creating MAP files, maybe because the files are so simple. Thanks for the tip :p

About the length of the functions, then my assumption is that we deal with one continuous block of functions and alignment data and in general all functions are moved. In this case the length of the functions does not matter when we do the reordering, I think. If the watermark is scattered in several distinct areas with stuff in between that is not part of the watermark then this complicates things as length of the functions matters. The idea when starting the mini-project was to make things as simple as possible to begin with and understand how to deal with this. Then we can make things more complicated along the way. For now the simple scenario is challenging enough for me ;)

Personally, I think this linking-order approach to watermarking is quite clever, mostly because I believe it is very practical (low-cost). There is no need for extra tools or going to write any assembler. All that is needed is a couple of additional lines in an already existing build system. So basically there is no extra work to be done by the software writer if the build system and version control system is already set. And I also think the watermark is not so easy to remove, but that is what we are hoping to find out :p

The IDC script has been expanded a little so that it actually takes care of reordering the functions.

Current IDC script

Code: Select all

#include <idc.idc> // Mandatory include directive

static EnumerateAndStoreFunctions(hfunctionnames)
{
    auto addr, tmpaddr, name, fidx, widx, bsuccess, tmphandle, inextfunction;
    addr = 0;
    fidx = 0; // function index
    widx = 0; // word idx
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return fidx;
        }
        
	    bsuccess = SetArrayString(hfunctionnames, 2*fidx, name);
	    if(bsuccess == 0)
        {
            Message("Saving name of function %s failed.",name); 
        }
	
        tmphandle = CreateArray(name);
        if(tmphandle == -1)
        {
            tmphandle = GetArrayId(name);
        }

        inextfunction = NextFunction(addr);
        if(inextfunction == BADADDR)
        {
            inextfunction = GetFunctionAttr(addr, FUNCATTR_END);
        }
        
        widx = 0;
        for(tmpaddr = addr; tmpaddr < inextfunction; tmpaddr = tmpaddr + 4)
        {
             SetArrayLong(tmphandle, widx, Dword(tmpaddr));
             widx = widx + 1;
        }
		bsuccess = SetArrayLong(hfunctionnames, 2*fidx+1, widx);
        fidx = fidx + 1;        
    }
    return fidx;
}

static PrintFunctions(hfunctionnames, inumberoffunctions)
{
    auto fidx;
    for(fidx = 0; fidx < inumberoffunctions; fidx = fidx + 1)
    {
        Message("Function: %s\n", GetArrayElement(AR_STR, hfunctionnames, 2*fidx));
    }
}

static WriteBackFunctions(hfunctionnames, inumberoffunctions, iwriteaddr)
{
    auto fidx, oidx, funcname, hopcodes, opcodeslen;

    for(fidx = 2; fidx >=0 ; fidx = fidx - 1)
    {
        funcname    = GetArrayElement(AR_STR, hfunctionnames, 2*fidx); 
        opcodeslen  = GetArrayElement(AR_LONG, hfunctionnames, 2*fidx+1); 
	    hopcodes    = GetArrayId(funcname);
        for(oidx = 0; oidx < opcodeslen; oidx = oidx + 1)
        {
			PatchDword(iwriteaddr, GetArrayElement(AR_LONG, hopcodes, oidx));
            iwriteaddr = iwriteaddr + 4;
        }
    }
}

static main()
{
    auto inumberoffunctions, hfunctionnames;

    // This array is populated with names of functions and
    // the length of the functions in dwords in the following
    // way [name1,length1,name2,length2,...]
    hfunctionnames = CreateArray("FunctionNames");
     
    if(hfunctionnames == -1)
    {
        // If array already exist get the handle by GetArrayId
        Message("hfunctionnames is -1.\n");
        hfunctionnames = GetArrayId("FunctionNames");
    }
 
    //  Enumerate functions and store them i persistent array
    inumberoffunctions = EnumerateAndStoreFunctions(hfunctionnames);	

    // Print functions in IDA's output window
    PrintFunctions(hfunctionnames, inumberoffunctions);
    
	// Write Back functions in reversed order
	WriteBackFunctions(hfunctionnames, inumberoffunctions, 0x401000);
     
}
Watermark1.exe original

Code: Select all

.text:00401000 ; =============== S U B R O U T I N E =======================================
.text:00401000
.text:00401000
.text:00401000 ; int __cdecl main(int argc, const char **argv, const char **envp)
.text:00401000 _main           proc near               ; CODE XREF: ___tmainCRTStartup+10Ap
.text:00401000                 call    sub_401010
.text:00401005                 call    sub_401020
.text:0040100A                 xor     eax, eax
.text:0040100C                 retn
.text:0040100C _main           endp
.text:0040100C
.text:0040100C ; ---------------------------------------------------------------------------
.text:0040100D                 align 10h
.text:00401010
.text:00401010 ; =============== S U B R O U T I N E =======================================
.text:00401010
.text:00401010
.text:00401010 sub_401010      proc near               ; CODE XREF: _mainp
.text:00401010                 push    offset Format   ; "Hello from object 1!\n"
.text:00401015                 call    ds :p rintf
.text:0040101B                 pop     ecx
.text:0040101C                 retn
.text:0040101C sub_401010      endp
.text:0040101C
.text:0040101C ; ---------------------------------------------------------------------------
.text:0040101D                 align 10h
.text:00401020
.text:00401020 ; =============== S U B R O U T I N E =======================================
.text:00401020
.text:00401020
.text:00401020 sub_401020      proc near               ; CODE XREF: _main+5p
.text:00401020                 push    offset aHelloFromObj_0 ; "Hello from object 2!\n"
.text:00401025                 call    ds :p rintf
.text:0040102B                 pop     ecx
.text:0040102C                 retn
.text:0040102C sub_401020      endp
Watermark1.exe modified with script

Code: Select all

.text:00401000 ; =============== S U B R O U T I N E =======================================
.text:00401000
.text:00401000
.text:00401000 ; int __cdecl main(int argc, const char **argv, const char **envp)
.text:00401000 _main           proc near               ; CODE XREF: ___tmainCRTStartup+10Ap
.text:00401000                 push    4020E0h
.text:00401005                 call    ds :p rintf
.text:0040100A                 add     [ecx-3Dh], bl
.text:0040100C                 retn
.text:0040100C _main           endp
.text:0040100C
.text:0040100C ; ---------------------------------------------------------------------------
.text:0040100D                 align 10h
.text:00401010
.text:00401010 ; =============== S U B R O U T I N E =======================================
.text:00401010
.text:00401010
.text:00401010 sub_401010      proc near               ; CODE XREF: _mainp
.text:00401010                 push    offset Format   ; "Hello from object 1!\n"
.text:00401015                 call    ds :p rintf
.text:0040101B                 pop     ecx
.text:0040101C                 retn
.text:0040101C sub_401010      endp
.text:0040101C
.text:0040101C ; ---------------------------------------------------------------------------
.text:0040101D                 align 10h
.text:00401020
.text:00401020 ; =============== S U B R O U T I N E =======================================
.text:00401020
.text:00401020
.text:00401020 sub_401020      proc near               ; CODE XREF: _main+5p
.text:00401020                 call    near ptr unk_4020A8-1078h ; "Hello from object 2!\n"
.text:00401025                 call    near ptr loc_40103C+4
.text:0040102B                 rol     bl, 0CCh
.text:0040102C                 retn
.text:0040102C sub_401020      endp
I have double-checked things in Hex-view

Code: Select all

Before (start 0x401000)
E8 0B 00 00 00 E8 16 00  00 00 33 C0 C3 CC CC CC
68 C8 20 40 00 FF 15 A0  20 40 00 59 C3 CC CC CC
68 E0 20 40 00 FF 15 A0  20 40 00 59 C3 68 12 14
After (start 0x401000)
68 E0 20 40 00 FF 15 A0  20 40 00 59 C3 68 12 14
68 C8 20 40 00 FF 15 A0  20 40 00 59 C3 CC CC CC
E8 0B 00 00 00 E8 16 00  00 00 33 C0 C3 CC CC CC
I don't understand the details of why for instance

Code: Select all

xor     eax, eax
in main becomes

Code: Select all

rol     bl, 0CCh
only that the addresses must be updated correspondingly in order to correct it. And this I think will be more difficult. I haven't yet thought about how to fix the addresses. One way maybe is to have a pre-processing stage where the addresses that need to be updated after the reordering are labeled and a post-processing stage after the actual reordering where the addresses are fixed...have to think some more about this :p
User avatar
Kayaker
Posts: 4179
Joined: Thu Oct 26, 2000 11:00 am

Post by Kayaker »

Hi niaren,

Nice start. If you edit the code it's always a good idea to get IDA to reanalyze. It will fix some things and point out errors in other sections. Try inserting something like the following at the end of main()

[php]
auto text_start, text_end, size;

text_start = SegByBase(1);
text_end = SegEnd(text_start);
size = text_end - text_start;

Message("text_start %x \n", text_start);
Message("text_end %x \n", text_end);
Message("size %x \n", size);

MakeUnknown (text_start, size, 1);
AnalyzeArea (text_start, text_end+1);
[/php]

or if you prefer to include the entire file it's even simpler to write just:

[php]
MakeUnknown (MinEA(), MaxEA() - MinEA(), 1);
AnalyzeArea (MinEA(), MaxEA());
[/php]


You can test this manually as well. After applying your existing script, undefine('U') the affected sections and reanalyze with 'C'. You'll see that the rol bl, 0CCh is fixed back to xor eax, eax, but you'll also see that the middle proc was actually affected negatively, which you don't see if you don't reanalyze.

Cheers,
Kayaker
User avatar
dELTA
Posts: 4209
Joined: Mon Oct 30, 2000 7:00 am
Location: Ring -1

Post by dELTA »

Kayaker wrote:Thanks for clarifying watermarking dELTA. I understood that it was to match a particular compilation to a particular person (so they might get the crap sued out of them as you say), but I was also envisioning it as being used as part of a "normal" protection scheme as well, which I guess doesn't necessarily have to be the case and obviously not part of this project.
Sure, it could of course be done, but it would be extremely stupid to reveal the watermark locations explicitly in the program's own code. :) A normal CRC will work just as well in that aspect, and be just as hard (easy) to patch out. You do of course understand this already, I'm just writing it here for reference. :)

Kayaker wrote:If that's not the case, then what's the benefit of removing such a watermark? If I've got IDA and I'm able to steal YOUR IDA, then I can swap watermarks and YOU get blamed for the release, is that it? :devil:
The benefit is that everyone will have their own copy of IDA for every new release, when people aren't afraid of leaking a cracked version of their own copy anymore, including you, when you (or whatever friend you're leeching it off :devil :) get tired of paying the yearly fee. :D

Jokes aside (and before "someone" gets unnecessarily pissed on us ;) ), this thread and project is not about warezing IDA. Rather, it's about the theoretical challenge of defeating a more or less powerful "protection technique", for the pure hell (and learning experience) of it, just like all other discussions on this board. The IDA watermarks are one of the most highly held (and foremost, practically efficient!) protection systems out there today, so of course it's fun to try to break it! :)

Kayaker wrote:This reminded me of a paper I had posted before
http://www.woodmann.com/forum/showthrea ... sification

...

The paper is a thesis which discusses the idea of creating software which is distributed as polymorphised versions, in an effort to discourage automated or generic cracking of it. Specifically it suggests the use of Genetic Algorithm (GA) programming to create a diverse population of software for distribution to the masses.


This suggests GA could also be used to create individualised programs. Change a few parameters, fitness/crossover values, record some unique aspect of the offspring (compiled program), and give it to its adopted parent (registered owner). If you find it outside of its new home (leaked), do a DNA analysis.
(rant start) Just for the record, I think the inclusion of "Genetic Algorithms" in that paper is just a stupid excuse to include some buzz words, and I don't at all see the practical use for it. The primary use of Genetic Algorithms is to find (semi)optimal solutions to massively multidimensional problems, while the efficient polymorphing of code in order to effectively hide information is absolutely not that kind of problem. All it will result in is less efficient and less systematic information hiding, and much more easily corruptable watermarks I think. It is very much like "artificial intelligence", which people also often try to use on completely incompatible and inoptimal problems, just because it has a "cool ring to it". (rant stop)

niaren wrote:Kayaker, did you create those MAP files in IDA? (File->Produce File->Create MAP file...) I didn't think of creating MAP files, maybe because the files are so simple. Thanks for the tip :p
I suspect he simply let the linker produce them, which would be much more efficient for use as "reference material" in a case like this. You will find options for it in your linker.

niaren wrote:I have double-checked things in Hex-view

Code: Select all

Before (start 0x401000)
E8 0B 00 00 00 E8 16 00  00 00 33 C0 C3 CC CC CC
68 C8 20 40 00 FF 15 A0  20 40 00 59 C3 CC CC CC
68 E0 20 40 00 FF 15 A0  20 40 00 59 C3 68 12 14
After (start 0x401000)
68 E0 20 40 00 FF 15 A0  20 40 00 59 C3 68 12 14
68 C8 20 40 00 FF 15 A0  20 40 00 59 C3 CC CC CC
E8 0B 00 00 00 E8 16 00  00 00 33 C0 C3 CC CC CC
The optimal visualization method for your results would probably be to configure your IDA to show full opcode bytes directly in the disassembly listing. Then you would not need complementary hex dumps like this, and the somewhat confusing coinciding relative offset collisions of the string pointers in your disassembly listings above would also be much more easily explained too.

Kayaker wrote:Nice start. If you edit the code it's always a good idea to get IDA to reanalyze.

...

You can test this manually as well. After applying your existing script, undefine('U') the affected sections and reanalyze with 'C'. You'll see that the rol bl, 0CCh is fixed back to xor eax, eax, but you'll also see that the middle proc was actually affected negatively, which you don't see if you don't reanalyze.
When it comes to massive code permutations like this, I would never trust the results of a mere reanalysis of the live listing inside IDA. Rather, I would let the IDC script patch the raw mutated bytes right into a copy of the executable on disk, and load that one up in IDA individually. Otherwise, my guess is that you'll sooner or later be in a world of unnecessary pain and confusion.

niaren wrote:...only that the addresses must be updated correspondingly in order to correct it. And this I think will be more difficult. I haven't yet thought about how to fix the addresses. One way maybe is to have a pre-processing stage where the addresses that need to be updated after the reordering are labeled and a post-processing stage after the actual reordering where the addresses are fixed
Yes, you should definitely identify and keep track of all offsets and addresses in the code before starting to shuffle it around, and then adjust all these accordingly after haven chosen a new location for the function in question. I strongly advice you to make use of IDAs powerful analysis and metadata information of the code for this purpose, since it has already done most of the hard work for you in this regard, i.e. identifying all offsets, addresses and other constructs relevant for such an operation!

Finally, very nice start niaren, keep up the good work, it will be much interesting to follow this! :yay:
"Give a man a quote from the FAQ, and he'll ignore it. Print the FAQ, shove it up his ass, kick him in the balls, DDoS his ass and kick/ban him, and the point usually gets through eventually."
niaren
Member
Posts: 70
Joined: Thu Dec 10, 2009 3:16 pm

Post by niaren »

Thanks for all the feedback. It's a real pleasure :)

Kayaker, I have tried to insert

Code: Select all

MakeUnknown (MinEA(), MaxEA() - MinEA(), 1);
AnalyzeArea (MinEA(), MaxEA());  
It does not really work, can't figure out why. I have to manually press 'U' 'C' as you said in order to get the disassembly to look right. This is of course unfortunate if we depend on IDA showing the correct disassembly. However, the approach used now in the script does not depend on IDA showing the correct disassembly, only initially.

And yes you were absolutely right, there was a bug in the script :)
The reason why I asked about the MAP files is because I had not foreseen you would actually build the files yourself :)

The script seems to work now including patching the call instructions. The script works by

- creating an address translation lookup table [I made up that name myself, don't know what else to call it :) ]
- Patch the instructions in-place (those that need to be updated)
- Finally do the reordering

The address translation LUT takes an RVA as input and returns the RVA in the reordered image. For watermark1.exe the LUT looks like this:

Code: Select all

Address 401000 mapped to 40101d
Address 401005 mapped to 401022
Address 40100a mapped to 401027
Address 40100c mapped to 401029
Address 401010 mapped to 40100d
Address 401015 mapped to 401012
Address 40101b mapped to 401018
Address 40101c mapped to 401019
Address 401020 mapped to 401000
Address 401025 mapped to 401005
Address 40102b mapped to 40100b
Address 40102c mapped to 40100c
The PatchInPlaceDebug function prints the LUT.

This is the script

Code: Select all

#include <idc.idc> // Mandatory include directive

static GetNumberOfFunctions()
{
    auto addr, name, fidx;
    addr = 0;
    fidx = 0; // function index
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return fidx;
        }
        fidx = fidx + 1;        
    }
    return fidx;
}

static CreatePermutation(inumberoffunctions)
{
    auto hpermutation;
    
	hpermutation = CreateArray("Permutation");
	if(hpermutation == -1)
    {
        // If array already exist get the handle by GetArrayId
        hpermutation = GetArrayId("Permutation");
    }
    // Hardcoded permutation
    SetArrayLong(hpermutation, 0, 2);
    SetArrayLong(hpermutation, 1, 1);
    SetArrayLong(hpermutation, 2, 0);
    return hpermutation;
}
	
static GetFunctionAddresses()
{
    auto addr, name, fidx, hfunctionaddresses;
    addr = 0;
    fidx = 0; // function index
    
    hfunctionaddresses = CreateArray("FunctionAddresses");
	if(hfunctionaddresses == -1)
    {
        // If array already exist get the handle by GetArrayId
        hfunctionaddresses = GetArrayId("FunctionAddresses");
    }

    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return hfunctionaddresses;
        }
        SetArrayLong(hfunctionaddresses, 2*fidx, addr);
        SetArrayLong(hfunctionaddresses, 2*fidx+1, NextFunction(addr) - addr);
        
        fidx = fidx + 1;
    }
    return hfunctionaddresses;
}

static GetNewFunctionAddresses(hfunctionaddresses, hpermutation, inumberoffunctions)
{
	auto addr, pidx, fidx, hnewfunctionaddresses;
    addr = 0;
    pidx = 0;
	fidx = 0;
	
    hnewfunctionaddresses = CreateArray("NewFunctionAddresses");
	if(hnewfunctionaddresses == -1)
    {
        // If array already exist get the handle by GetArrayId
        hnewfunctionaddresses = GetArrayId("NewFunctionAddresses");
    }
    
	// Address of first function
    addr = NextFunction(addr);
    
    fidx = GetArrayElement(AR_LONG, hpermutation, pidx); 
    SetArrayLong(hnewfunctionaddresses, fidx, addr);
    addr = addr + GetArrayElement(AR_LONG, hfunctionaddresses, 2*fidx+1);
    
    for(pidx=1; pidx < inumberoffunctions; pidx++)
    {
        fidx = GetArrayElement(AR_LONG, hpermutation, pidx); 
	    SetArrayLong(hnewfunctionaddresses, fidx, addr);
		addr = addr + GetArrayElement(AR_LONG, hfunctionaddresses, 2*fidx+1);
    }
    return hnewfunctionaddresses;
}

static CreateAddressTranslationLUT(hnewfunctionaddresses)
{
	auto addr, haddresstranslationlut, name, end, inst, newaddr, fidx;
    addr = 0;
    fidx = 0;
    
    haddresstranslationlut = CreateArray("AddressTranslationLookupTable");
	if(haddresstranslationlut == -1)
    {
        // If array already exist get the handle by GetArrayId
        haddresstranslationlut = GetArrayId("AddressTranslationLookupTable");
    }
    
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return haddresstranslationlut;
        }
		end  = GetFunctionAttr(addr, FUNCATTR_END);
        inst = addr;
        
        // Get new base address of function
        newaddr = GetArrayElement(AR_LONG, hnewfunctionaddresses, fidx);
        
        SetArrayLong(haddresstranslationlut, inst, newaddr);
        Message("haddresstranslationlut %x \n",haddresstranslationlut);
        inst = FindCode(inst, SEARCH_DOWN | SEARCH_NEXT);
        while(inst < end)
        {
			SetArrayLong(haddresstranslationlut, inst, newaddr + (inst-addr));
			inst = FindCode(inst, SEARCH_DOWN | SEARCH_NEXT);
        }
        fidx = fidx + 1;
    }
    return haddresstranslationlut;
}

static PatchInPlaceDebug(haddresstranslationlut)
{
	auto addr, name, end, inst, newaddr;
    addr = 0;
    
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return;
        }
		end  = GetFunctionAttr(addr, FUNCATTR_END);
        inst = addr;
        
        while(inst < end)
        {
			Message("Address %x mapped to %x\n",inst,GetArrayElement(AR_LONG, haddresstranslationlut, inst));
			inst = FindCode(inst, SEARCH_DOWN | SEARCH_NEXT);
        }
    }
}

static PatchInPlace(haddresstranslationlut)
{
	auto addr, name, end, inst, newaddr, opidx, optype, newrva, nearaddr;
    addr = 0;
    
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return;
        }
		end  = GetFunctionAttr(addr, FUNCATTR_END);
        inst = addr;
        
        while(inst < end)
        {
			opidx = 0;
			optype = GetOpType(inst,opidx);
			while(optype > 0)
			{
				if(optype == 7)
				{
					// Immediate Near Address
					
					// Maybe not necessary but check for call instruction
					if(GetMnem(inst) == "call")
					{
						Message("Instruction at %x being patched.\n", inst);
						nearaddr = LocByName(GetOpnd(inst, opidx));
						newrva   = GetArrayElement(AR_LONG, haddresstranslationlut, nearaddr) - (GetArrayElement(AR_LONG, haddresstranslationlut, inst)+0x6);
						PatchDword(inst+0x1, newrva+0x1);
						if(nearaddr == BADADDR)
						{
							Message("Fatal error, error processing instruction at %x\n", inst);
						}
					}
					else
					{
						Message("Unsupported! Unknown %s instruction needs to be patched.\n", GetMnem(inst));
					}
						
				}
				
				opidx++;
				optype = GetOpType(inst,opidx);
			}  
			inst = FindCode(inst, SEARCH_DOWN | SEARCH_NEXT);        }
    }
}

static EnumerateAndStoreFunctions(hfunctionnames)
{
    auto addr, tmpaddr, name, fidx, widx, bsuccess, tmphandle, inextfunction;
    addr = 0;
    fidx = 0; // function index
    widx = 0; // word idx
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return fidx;
        }
        
	    bsuccess = SetArrayString(hfunctionnames, 2*fidx, name);
	    if(bsuccess == 0)
        {
            Message("Saving name of function %s failed.",name); 
        }
	
        tmphandle = CreateArray(name);
        if(tmphandle == -1)
        {
            tmphandle = GetArrayId(name);
        }

        inextfunction = NextFunction(addr);
        if(inextfunction == BADADDR)
        {
            inextfunction = GetFunctionAttr(addr, FUNCATTR_END);
        }
        
        widx = 0;
        for(tmpaddr = addr; tmpaddr < inextfunction; tmpaddr = tmpaddr + 1)
        {
             SetArrayLong(tmphandle, widx, Byte (tmpaddr));
             widx = widx + 1;
        }
		bsuccess = SetArrayLong(hfunctionnames, 2*fidx+1, widx);
        fidx = fidx + 1;        
    }
    return fidx;
}

static PrintFunctions(hfunctionnames, inumberoffunctions)
{
    auto fidx;
    for(fidx = 0; fidx < inumberoffunctions; fidx = fidx + 1)
    {
        Message("Function: %s\n", GetArrayElement(AR_STR, hfunctionnames, 2*fidx));
    }
}

static WriteBackFunctions(hfunctionnames, inumberoffunctions, iwriteaddr)
{
    auto fidx, oidx, funcname, hopcodes, opcodeslen;

    for(fidx = 2; fidx >=0 ; fidx = fidx - 1)
    {
        funcname    = GetArrayElement(AR_STR, hfunctionnames, 2*fidx); 
        opcodeslen  = GetArrayElement(AR_LONG, hfunctionnames, 2*fidx+1); 
	    hopcodes    = GetArrayId(funcname);
        for(oidx = 0; oidx < opcodeslen; oidx = oidx + 1)
        {
			PatchByte(iwriteaddr, GetArrayElement(AR_LONG, hopcodes, oidx));
            iwriteaddr = iwriteaddr + 1;
        }
    }
}

static main()
{
    auto didx, inumberoffunctions, hfunctionnames, hpermutation, hfunctionaddresses, hnewfunctionaddresses;
    auto haddresstranslationlut;
    
    // Get number of functions 
    inumberoffunctions = GetNumberOfFunctions();
    
    // DEBUG
    // Message("Number of functions %d\n",inumberoffunctions);

	// Create permutation array
	hpermutation = CreatePermutation(inumberoffunctions);
    
    // Get current function addresses
    hfunctionaddresses = GetFunctionAddresses();
    
    // Get addresses after permutation
    hnewfunctionaddresses = GetNewFunctionAddresses(hfunctionaddresses, hpermutation, inumberoffunctions);    

    // Pre-processing, create address translation lookup table
    haddresstranslationlut = CreateAddressTranslationLUT(hnewfunctionaddresses);
    
	PatchInPlace(haddresstranslationlut);
    
	//DEBUG  
    //for(didx = 0; didx<inumberoffunctions; didx++)
    //{
	//	Message("New Function address: %x\n", GetArrayElement(AR_LONG, hnewfunctionaddresses, didx));
    //}
    //return;
    
    // This array is populated with names of functions and
    // the length of the functions in dwords in the following
    // way [name1,length1,name2,length2,...]
    hfunctionnames = CreateArray("FunctionNames");
     
    if(hfunctionnames == -1)
    {
        // If array already exist get the handle by GetArrayId
        Message("hfunctionnames is -1.\n");
        hfunctionnames = GetArrayId("FunctionNames");
    }
 
    //  Enumerate functions and store them i persistent array
    inumberoffunctions = EnumerateAndStoreFunctions(hfunctionnames);	

    // Print functions in IDA's output window
    PrintFunctions(hfunctionnames, inumberoffunctions);
    
	// Write Back functions in reversed order
	WriteBackFunctions(hfunctionnames, inumberoffunctions, 0x401000);
 
	MakeUnknown (MinEA(), MaxEA() - MinEA(), 1);
    AnalyzeArea (MinEA(), MaxEA());      
}
Watermark1.exe before

Code: Select all

.text:00401000 ; int __cdecl main(int argc, const char **argv, const char **envp)
.text:00401000 _main           proc near               ; CODE XREF: ___tmainCRTStartup+10Ap
.text:00401000                 call    sub_401010
.text:00401005                 call    sub_401020
.text:0040100A                 xor     eax, eax
.text:0040100C                 retn
.text:0040100C _main           endp
.text:0040100C
.text:0040100C ; ---------------------------------------------------------------------------
.text:0040100D                 align 10h
.text:00401010
.text:00401010 ; =============== S U B R O U T I N E =======================================
.text:00401010
.text:00401010
.text:00401010 sub_401010      proc near               ; CODE XREF: _mainp
.text:00401010                 push    offset Format   ; "Hello from object 1!\n"
.text:00401015                 call    ds :p rintf
.text:0040101B                 pop     ecx
.text:0040101C                 retn
.text:0040101C sub_401010      endp
.text:0040101C
.text:0040101C ; ---------------------------------------------------------------------------
.text:0040101D                 align 10h
.text:00401020
.text:00401020 ; =============== S U B R O U T I N E =======================================
.text:00401020
.text:00401020
.text:00401020 sub_401020      proc near               ; CODE XREF: _main+5p
.text:00401020                 push    offset aHelloFromObj_0 ; "Hello from object 2!\n"
.text:00401025                 call    ds :p rintf
.text:0040102B                 pop     ecx
.text:0040102C                 retn
.text:0040102C sub_401020      endp
watermark1.exe after (had to press 'U' 'C' on the last routine)

Code: Select all

.text:00401000 _main           proc near               ; CODE XREF: .text:00401022p
.text:00401000                                         ; .text:00401182p
.text:00401000                 push    offset aHelloFromObjec ; "Hello from object 2!\n"
.text:00401005                 call    ds :p rintf
.text:0040100B                 pop     ecx
.text:0040100C                 retn
.text:0040100C _main           endp
.text:0040100C
.text:0040100D
.text:0040100D ; =============== S U B R O U T I N E =======================================
.text:0040100D
.text:0040100D
.text:0040100D sub_40100D      proc near               ; CODE XREF: .text:0040101Dp
.text:0040100D                 push    offset aHelloFromObj_0 ; "Hello from object 1!\n"
.text:00401012                 call    ds :p rintf
.text:00401018                 pop     ecx
.text:00401019                 retn
.text:00401019 sub_40100D      endp
.text:00401019
.text:00401019 ; ---------------------------------------------------------------------------
.text:0040101A                 db 3 dup(0CCh)
.text:0040101D ; ---------------------------------------------------------------------------
.text:0040101D                 call    sub_40100D
.text:00401022                 call    _main
.text:00401027                 xor     eax, eax
.text:00401029                 retn
The acid test must be to get a working exe. I was a little surprised to learn that File->Produce File->Create EXE file... shows an 'Unsupported' messagebox. The Entry Point also needs to be changed.

I have also tested the script on watermark2.exe and it also seem to work there after forcing IDA to show the correct disassembly with 'U' 'C'.

I will do some searching afterwards to see how I can get the changes made in IDA down to a file on disk as you mentioned dELTA.
Another thing that I'm seriously considering is to switch to IDAPython. The IDC script is a little messy, there is little code reuse and I can hardly find way through the code myself. I hope all this will change going to Python. This mini-project was also about learning IDC but I think there is IDC code enough now.
Are you ready for Python? :)
User avatar
Kayaker
Posts: 4179
Joined: Thu Oct 26, 2000 11:00 am

Post by Kayaker »

No I just used the IDA feature to produce the MAP file. Rumours to the contrary are unfounded :p
I mean, we couldn't relink to create a map file in real life, so one could hardly "cheat" in this mini project right? ;)


I see what you mean about AnalyzeArea not working very well. I tried adding a second instance after the first with

Code: Select all

...
    Wait();  // Wait for the end of autoanalysis
    AnalyzeArea (MinEA(), MaxEA());
The second pass produced further changes, but still didn't get everything correct. As dELTA alluded to, I guess it's not perfect and a full redisassembly of a patched file would probably produce better results.


However, if you're interested in what else you can do to produce good (re)disassembly results using a script, you might want to look at the source of the very effective IDA_ExtraPass_PlugIn by Sirmabus. It handles things like 'align' blocks, stray blocks of code, undefined functions and such.

http://www.woodmann.com/collaborative/t ... /ExtraPass


If interested, you might also like to look at the IDC scripts I wrote for analyzing a malware:

http://www.woodmann.com/forum/entry.php ... ant-Part-1

I ended up doing several "clean-up" passes to make a readable disassembly. They are in 4 separate idc scripts just for clarity. The first was a standard AnalyzeArea reanalysis after doing some decrypting. The next step was a manual fix-up of embedded string pointers (I couldn't think of a "smart" script to handle that automatically).

Then came a script to convert operands of the form "[ebp+xxxxxxh]" to a real offset, another one to clean up unwanted operand prefix/suffix text the disassembly produced, and one more to resolve API addresses. Finally we read in a C header file containing some undefined function prototypes and structures. If you read the full blog post, I mention a few more details about some useful idc commands and a few quirks I found while working with reanalysing a disassembly.

It just goes to show that there are a fair number of things you can do to produce a "nice" looking and accurate disassembly using IDC/plw scripts.


Python? Are you a masochist? :D
niaren
Member
Posts: 70
Joined: Thu Dec 10, 2009 3:16 pm

Post by niaren »

Kayaker wrote: Python? Are you a masochist? :D
Hehe, had a good laugh :p

I will take a look at all the goodies you referenced in order to get the disassembly right. However, I can't wait to test the script on a real exe so I have prioritized that for today. And I think I'm almost there now. Made a quick test on watermark1.exe and it looked alright. Just need to change the entry point and do some testing :)

Code: Select all

#include <idc.idc> // Mandatory include directive

static GetFileHandle(mode)
{
	auto hFile;
	
	hFile = fopen(GetInputFilePath(), mode);
	if (0 == hFile)
	{
		Message("Cannot open \"" + GetInputFile() + "\"");
	}
	return hFile;
}

static GetPointerToPEHeader(hfile)
{
	auto e_lfanew;
	
	// Seek to the e_lfanew field 
	if (0 != fseek(hfile, 0x3C, 0))
	{
		Message(" 1 Cannot seek in \"" + GetInputFile() + "\", handle: %x", hfile);
	}

	// Read the value of e_lfanew
	e_lfanew = readlong(hfile, 0);

	// Seek to IMAGE_NT_HEADERS
	if (0 != fseek(hfile, e_lfanew, 0))
	{
		Message(" 2 Cannot seek in \"" + GetInputFile() + "\", handle: %x, elfanew: %x\n", hfile, e_lfanew);
	}

	// Read the Signature
	if (0x00004550 != readlong(hfile, 0))
	{
		Message("Not a valid PE file");
	}
	return e_lfanew;
}

static GetImageBase(hfile, e_lfanew)
{
	auto imageBase;
	
	// Seek to the IMAGE_NT_HEADERS.OptionalHeader.ImageBase field
	if (0 != fseek(hfile, e_lfanew + 0x18 + 0x1C, 0))
	{
		Fatal(" 3 Cannot seek in \"" + GetInputFile() + "\"");
	}
	imageBase = readlong(hfile, 0);
	return imageBase;
}

static GetVirtualSectionOffset(hfile, e_lfanew, section)
{
	auto numberOfSections, sectionRva;
	
	// Seek to the IMAGE_FILE_HEADER.NumberOfSections field
	if (0 != fseek(hfile, e_lfanew + 0x06, 0))
	{
		Fatal(" 4 Cannot seek in \"" + GetInputFile() + "\"");
	}

	// Read the number of sections
	numberOfSections = readshort(hfile, 0);
	
	if (section >= numberOfSections)
	{
		Fatal("Invalid section");
	}

	// Seek to the desired section
	if (0 != fseek(hfile, e_lfanew + 0xF8 + section * 0x28 + 0x0C, 0))
	{
		Fatal(" 5 Cannot seek in \"" + GetInputFile() + "\"");
	}

	sectionRva = readlong(hfile, 0);
	return sectionRva;
}

static GetRawSectionOffset(hfile, e_lfanew, section)
{
	auto pointerToRawData;
	
	// Seek to the desired section
	if (0 != fseek(hfile, e_lfanew + 0xF8 + section * 0x28 + 0x14, 0))
	{
		Fatal(" 6 Cannot seek in \"" + GetInputFile() + "\"");
	}

	pointerToRawData = readlong(hfile, 0);
	return pointerToRawData;
}

static GetFileOffset(rva, imagebase, virtualsectionoffset, rawsectionoffset)
{
	return rva - imagebase - virtualsectionoffset + rawsectionoffset;
}

static GetNumberOfFunctions()
{
    auto addr, name, fidx;
    addr = 0;
    fidx = 0; // function index
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return fidx;
        }
        fidx = fidx + 1;        
    }
    return fidx;
}

static CreatePermutation(inumberoffunctions)
{
    auto hpermutation;
    
	hpermutation = CreateArray("Permutation");
	if(hpermutation == -1)
    {
        // If array already exist get the handle by GetArrayId
        hpermutation = GetArrayId("Permutation");
    }
    // Hardcoded permutation
    SetArrayLong(hpermutation, 0, 2);
    SetArrayLong(hpermutation, 1, 1);
    SetArrayLong(hpermutation, 2, 0);
    return hpermutation;
}
	
static GetFunctionAddresses()
{
    auto addr, name, fidx, hfunctionaddresses;
    addr = 0;
    fidx = 0; // function index
    
    hfunctionaddresses = CreateArray("FunctionAddresses");
	if(hfunctionaddresses == -1)
    {
        // If array already exist get the handle by GetArrayId
        hfunctionaddresses = GetArrayId("FunctionAddresses");
    }

    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return hfunctionaddresses;
        }
        SetArrayLong(hfunctionaddresses, 2*fidx, addr);
        SetArrayLong(hfunctionaddresses, 2*fidx+1, NextFunction(addr) - addr);
        
        fidx = fidx + 1;
    }
    return hfunctionaddresses;
}

static GetNewFunctionAddresses(hfunctionaddresses, hpermutation, inumberoffunctions)
{
	auto addr, pidx, fidx, hnewfunctionaddresses;
    addr = 0;
    pidx = 0;
	fidx = 0;
	
    hnewfunctionaddresses = CreateArray("NewFunctionAddresses");
	if(hnewfunctionaddresses == -1)
    {
        // If array already exist get the handle by GetArrayId
        hnewfunctionaddresses = GetArrayId("NewFunctionAddresses");
    }
    
	// Address of first function
    addr = NextFunction(addr);
    
    fidx = GetArrayElement(AR_LONG, hpermutation, pidx); 
    SetArrayLong(hnewfunctionaddresses, fidx, addr);
    addr = addr + GetArrayElement(AR_LONG, hfunctionaddresses, 2*fidx+1);
    
    for(pidx=1; pidx < inumberoffunctions; pidx++)
    {
        fidx = GetArrayElement(AR_LONG, hpermutation, pidx); 
	    SetArrayLong(hnewfunctionaddresses, fidx, addr);
		addr = addr + GetArrayElement(AR_LONG, hfunctionaddresses, 2*fidx+1);
    }
    return hnewfunctionaddresses;
}

static CreateAddressTranslationLUT(hnewfunctionaddresses)
{
	auto addr, haddresstranslationlut, name, end, inst, newaddr, fidx;
    addr = 0;
    fidx = 0;
    
    haddresstranslationlut = CreateArray("AddressTranslationLookupTable");
	if(haddresstranslationlut == -1)
    {
        // If array already exist get the handle by GetArrayId
        haddresstranslationlut = GetArrayId("AddressTranslationLookupTable");
    }
    
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return haddresstranslationlut;
        }
		end  = GetFunctionAttr(addr, FUNCATTR_END);
        inst = addr;
        
        // Get new base address of function
        newaddr = GetArrayElement(AR_LONG, hnewfunctionaddresses, fidx);
        
        SetArrayLong(haddresstranslationlut, inst, newaddr);
        Message("haddresstranslationlut %x \n",haddresstranslationlut);
        inst = FindCode(inst, SEARCH_DOWN | SEARCH_NEXT);
        while(inst < end)
        {
			SetArrayLong(haddresstranslationlut, inst, newaddr + (inst-addr));
			inst = FindCode(inst, SEARCH_DOWN | SEARCH_NEXT);
        }
        fidx = fidx + 1;
    }
    return haddresstranslationlut;
}

static PatchInPlaceDebug(haddresstranslationlut)
{
	auto addr, name, end, inst, newaddr;
    addr = 0;
    
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return;
        }
		end  = GetFunctionAttr(addr, FUNCATTR_END);
        inst = addr;
        
        while(inst < end)
        {
			Message("Address %x mapped to %x\n",inst,GetArrayElement(AR_LONG, haddresstranslationlut, inst));
			inst = FindCode(inst, SEARCH_DOWN | SEARCH_NEXT);
        }
    }
}

static PatchInPlace(haddresstranslationlut)
{
	auto addr, name, end, inst, newaddr, opidx, optype, newrva, nearaddr;
    addr = 0;
    
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return;
        }
		end  = GetFunctionAttr(addr, FUNCATTR_END);
        inst = addr;
        
        while(inst < end)
        {
			opidx = 0;
			optype = GetOpType(inst,opidx);
			while(optype > 0)
			{
				if(optype == 7)
				{
					// Immediate Near Address
					
					// Maybe not necessary but check for call instruction
					if(GetMnem(inst) == "call")
					{
						Message("Instruction at %x being patched.\n", inst);
						nearaddr = LocByName(GetOpnd(inst, opidx));
						newrva   = GetArrayElement(AR_LONG, haddresstranslationlut, nearaddr) - (GetArrayElement(AR_LONG, haddresstranslationlut, inst)+0x6);
						PatchDword(inst+0x1, newrva+0x1);
						if(nearaddr == BADADDR)
						{
							Message("Fatal error, error processing instruction at %x\n", inst);
						}
					}
					else
					{
						Message("Unsupported! Unknown %s instruction needs to be patched.\n", GetMnem(inst));
					}
						
				}
				
				opidx++;
				optype = GetOpType(inst,opidx);
			}  
			inst = FindCode(inst, SEARCH_DOWN | SEARCH_NEXT);        }
    }
}

static EnumerateAndStoreFunctions(hfunctionnames)
{
    auto addr, tmpaddr, name, fidx, widx, bsuccess, tmphandle, inextfunction;
    addr = 0;
    fidx = 0; // function index
    widx = 0; // word idx
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return fidx;
        }
        
	    bsuccess = SetArrayString(hfunctionnames, 2*fidx, name);
	    if(bsuccess == 0)
        {
            Message("Saving name of function %s failed.",name); 
        }
	
        tmphandle = CreateArray(name);
        if(tmphandle == -1)
        {
            tmphandle = GetArrayId(name);
        }

        inextfunction = NextFunction(addr);
        if(inextfunction == BADADDR)
        {
            inextfunction = GetFunctionAttr(addr, FUNCATTR_END);
        }
        
        widx = 0;
        for(tmpaddr = addr; tmpaddr < inextfunction; tmpaddr = tmpaddr + 1)
        {
             SetArrayLong(tmphandle, widx, Byte (tmpaddr));
             widx = widx + 1;
        }
		bsuccess = SetArrayLong(hfunctionnames, 2*fidx+1, widx);
        fidx = fidx + 1;        
    }
    return fidx;
}

static PrintFunctions(hfunctionnames, inumberoffunctions)
{
    auto fidx;
    for(fidx = 0; fidx < inumberoffunctions; fidx = fidx + 1)
    {
        Message("Function: %s\n", GetArrayElement(AR_STR, hfunctionnames, 2*fidx));
    }
}

static WriteBackFunctions(hfunctionnames, inumberoffunctions, iwriteaddr, writetofile, hfile)
{
    auto fidx, oidx, funcname, hopcodes, opcodeslen;
	auto imagebase, virtualsectionoffset, rawsectionoffset;
	auto writeerror, byte, hglobalvars, fileoffset;
	
	if(writetofile == 1)
	{
		hglobalvars          = GetArrayId("GlobalVars");
		imagebase            = GetArrayElement(AR_LONG, hglobalvars, 0);
		virtualsectionoffset = GetArrayElement(AR_LONG, hglobalvars, 1);
		rawsectionoffset     = GetArrayElement(AR_LONG, hglobalvars, 2);
	}
	
	// DEBUG
	Message("imagebase: %x, virtualsectionoffset: %x, rawsectionoffset: %x\n",imagebase,virtualsectionoffset,rawsectionoffset);
	
    for(fidx = 2; fidx >=0 ; fidx = fidx - 1)
    {
        funcname    = GetArrayElement(AR_STR, hfunctionnames, 2*fidx); 
        opcodeslen  = GetArrayElement(AR_LONG, hfunctionnames, 2*fidx+1); 
	    hopcodes    = GetArrayId(funcname);
        for(oidx = 0; oidx < opcodeslen; oidx = oidx + 1)
        {
            byte = GetArrayElement(AR_LONG, hopcodes, oidx);
			PatchByte(iwriteaddr, byte);
			if(writetofile == 1)
			{
			    fileoffset = GetFileOffset(iwriteaddr, imagebase, virtualsectionoffset, rawsectionoffset);
			    writeerror = fseek(hfile, fileoffset, 0);
				writeerror = fputc(byte, hfile);
				if(writeerror == -1)
				{
					Message("Could not write to file (RVA %x)",iwriteaddr);
					return;
				}
				Message("Write byte %x to file offset %x\n", byte, fileoffset);
			}
			
            iwriteaddr = iwriteaddr + 1;
        }
    }
}

static main()
{
	auto hfile, e_lfanew, imagebase, virtualsectionoffset, rawsectionoffset, writetofile, section;
    auto didx, inumberoffunctions, hfunctionnames, hpermutation, hfunctionaddresses, hnewfunctionaddresses;
    auto haddresstranslationlut, hglobalvars;
    
	writetofile              = 1;
	
	// This is init stuff and should be wrapped into a separate init function
	if(writetofile == 1)
	{
		section              = 0;
		hfile                = GetFileHandle("rb");
		e_lfanew             = GetPointerToPEHeader(hfile);
		imagebase            = GetImageBase(hfile, e_lfanew);
		virtualsectionoffset = GetVirtualSectionOffset(hfile, e_lfanew, section);
		rawsectionoffset     = GetRawSectionOffset(hfile, e_lfanew, section);
	    
	    hglobalvars          = CreateArray("GlobalVars");
		if(hglobalvars == -1)
		{
			// If array already exist get the handle by GetArrayId
			hglobalvars = GetArrayId("GlobalVars");
		}
		SetArrayLong(hglobalvars, 0, imagebase);
		SetArrayLong(hglobalvars, 1, virtualsectionoffset);
		SetArrayLong(hglobalvars, 2, rawsectionoffset);
		fclose(hfile);
		hfile                = GetFileHandle("r+");
	}
    
    // Get number of functions 
    inumberoffunctions = GetNumberOfFunctions();
    
    // DEBUG
    // Message("Number of functions %d\n",inumberoffunctions);

	// Create permutation array
	hpermutation = CreatePermutation(inumberoffunctions);
    
    // Get current function addresses
    hfunctionaddresses = GetFunctionAddresses();
    
    // Get addresses after permutation
    hnewfunctionaddresses = GetNewFunctionAddresses(hfunctionaddresses, hpermutation, inumberoffunctions);    

    // Pre-processing, create address translation lookup table
    haddresstranslationlut = CreateAddressTranslationLUT(hnewfunctionaddresses);
    
	PatchInPlace(haddresstranslationlut);
    
	//DEBUG  
    //for(didx = 0; didx<inumberoffunctions; didx++)
    //{
	//	Message("New Function address: %x\n", GetArrayElement(AR_LONG, hnewfunctionaddresses, didx));
    //}
    //return;
    
    // This array is populated with names of functions and
    // the length of the functions in dwords in the following
    // way [name1,length1,name2,length2,...]
    hfunctionnames = CreateArray("FunctionNames");
     
    if(hfunctionnames == -1)
    {
        // If array already exist get the handle by GetArrayId
        Message("hfunctionnames is -1.\n");
        hfunctionnames = GetArrayId("FunctionNames");
    }
 
    //  Enumerate functions and store them i persistent array
    inumberoffunctions = EnumerateAndStoreFunctions(hfunctionnames);	

    // Print functions in IDA's output window
    PrintFunctions(hfunctionnames, inumberoffunctions);
    
	// Write Back functions in reversed order
	WriteBackFunctions(hfunctionnames, inumberoffunctions, 0x401000, writetofile, hfile);
 
	if(writetofile == 1)
	{
		fclose(hfile);
	}
 
	MakeUnknown (MinEA(), MaxEA() - MinEA(), 1);
    AnalyzeArea (MinEA(), MaxEA());      
}
niaren
Member
Posts: 70
Joined: Thu Dec 10, 2009 3:16 pm

Post by niaren »

After running the below script on watermark1.exe I learned about Base relocations. It's something that prevents the 'dewatermarked' exe from running :p

The below script now takes care of correcting the entry point as well so it should (in theory) work on an exe with fixed base or stripped relocation info.
If I manually, in a hex-editor, correct the relevant values in the relocation directory, the 'dewatermarked' exe runs without any problems. As watermark1.exe and watermark2.exe were built with relocation information and dynamic base it would most likely be considered cheating if the script isn't updated to correct relocation information as well :)

Hopefully there will be an Xmas version of the IDC script that takes care of the relocation information but most likely it will be a new year edition ;)

Code: Select all

#include <idc.idc> // Mandatory include directive

static GetFileHandle(mode)
{
	auto hFile;
	
	hFile = fopen(GetInputFilePath(), mode);
	if (0 == hFile)
	{
		Message("Cannot open \"" + GetInputFile() + "\"");
	}
	return hFile;
}

static GetPointerToPEHeader(hfile)
{
	auto e_lfanew;
	
	// Seek to the e_lfanew field 
	if (0 != fseek(hfile, 0x3C, 0))
	{
		Message(" 1 Cannot seek in \"" + GetInputFile() + "\", handle: %x", hfile);
	}

	// Read the value of e_lfanew
	e_lfanew = readlong(hfile, 0);

	// Seek to IMAGE_NT_HEADERS
	if (0 != fseek(hfile, e_lfanew, 0))
	{
		Message(" 2 Cannot seek in \"" + GetInputFile() + "\", handle: %x, elfanew: %x\n", hfile, e_lfanew);
	}

	// Read the Signature
	if (0x00004550 != readlong(hfile, 0))
	{
		Message("Not a valid PE file");
	}
	return e_lfanew;
}

static GetImageBase(hfile, e_lfanew)
{
	auto imageBase;
	
	// Seek to the IMAGE_NT_HEADERS.OptionalHeader.ImageBase field
	if (0 != fseek(hfile, e_lfanew + 0x18 + 0x1C, 0))
	{
		Fatal(" 3 Cannot seek in \"" + GetInputFile() + "\"");
	}
	imageBase = readlong(hfile, 0);
	return imageBase;
}

static GetVirtualSectionOffset(hfile, e_lfanew, section)
{
	auto numberOfSections, sectionRva;
	
	// Seek to the IMAGE_FILE_HEADER.NumberOfSections field
	if (0 != fseek(hfile, e_lfanew + 0x06, 0))
	{
		Fatal(" 4 Cannot seek in \"" + GetInputFile() + "\"");
	}

	// Read the number of sections
	numberOfSections = readshort(hfile, 0);
	
	if (section >= numberOfSections)
	{
		Fatal("Invalid section");
	}

	// Seek to the desired section
	if (0 != fseek(hfile, e_lfanew + 0xF8 + section * 0x28 + 0x0C, 0))
	{
		Fatal(" 5 Cannot seek in \"" + GetInputFile() + "\"");
	}

	sectionRva = readlong(hfile, 0);
	return sectionRva;
}

static GetRawSectionOffset(hfile, e_lfanew, section)
{
	auto pointerToRawData;
	
	// Seek to the desired section
	if (0 != fseek(hfile, e_lfanew + 0xF8 + section * 0x28 + 0x14, 0))
	{
		Fatal(" 6 Cannot seek in \"" + GetInputFile() + "\"");
	}

	pointerToRawData = readlong(hfile, 0);
	return pointerToRawData;
}

static GetFileOffset(rva, imagebase, virtualsectionoffset, rawsectionoffset)
{
	return rva - imagebase - virtualsectionoffset + rawsectionoffset;
}

static GetNumberOfFunctions()
{
    auto addr, name, fidx;
    addr = 0;
    fidx = 0; // function index
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return fidx;
        }
        fidx = fidx + 1;        
    }
    return fidx;
}

static CreatePermutation(inumberoffunctions)
{
    auto hpermutation;
    
	hpermutation = CreateArray("Permutation");
	if(hpermutation == -1)
    {
        // If array already exist get the handle by GetArrayId
        hpermutation = GetArrayId("Permutation");
    }
    // Hardcoded permutation
    SetArrayLong(hpermutation, 0, 2);
    SetArrayLong(hpermutation, 1, 1);
    SetArrayLong(hpermutation, 2, 0);
    return hpermutation;
}
	
static GetFunctionAddresses()
{
    auto addr, name, fidx, hfunctionaddresses;
    addr = 0;
    fidx = 0; // function index
    
    hfunctionaddresses = CreateArray("FunctionAddresses");
	if(hfunctionaddresses == -1)
    {
        // If array already exist get the handle by GetArrayId
        hfunctionaddresses = GetArrayId("FunctionAddresses");
    }

    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return hfunctionaddresses;
        }
        SetArrayLong(hfunctionaddresses, 2*fidx, addr);
        SetArrayLong(hfunctionaddresses, 2*fidx+1, NextFunction(addr) - addr);
        
        fidx = fidx + 1;
    }
    return hfunctionaddresses;
}

static GetNewFunctionAddresses(hfunctionaddresses, hpermutation, inumberoffunctions)
{
	auto addr, pidx, fidx, hnewfunctionaddresses;
    addr = 0;
    pidx = 0;
	fidx = 0;
	
    hnewfunctionaddresses = CreateArray("NewFunctionAddresses");
	if(hnewfunctionaddresses == -1)
    {
        // If array already exist get the handle by GetArrayId
        hnewfunctionaddresses = GetArrayId("NewFunctionAddresses");
    }
    
	// Address of first function
    addr = NextFunction(addr);
    
    fidx = GetArrayElement(AR_LONG, hpermutation, pidx); 
    SetArrayLong(hnewfunctionaddresses, fidx, addr);
    addr = addr + GetArrayElement(AR_LONG, hfunctionaddresses, 2*fidx+1);
    
    for(pidx=1; pidx < inumberoffunctions; pidx++)
    {
        fidx = GetArrayElement(AR_LONG, hpermutation, pidx); 
	    SetArrayLong(hnewfunctionaddresses, fidx, addr);
		addr = addr + GetArrayElement(AR_LONG, hfunctionaddresses, 2*fidx+1);
    }
    return hnewfunctionaddresses;
}

static CreateAddressTranslationLUT(hnewfunctionaddresses)
{
	auto addr, haddresstranslationlut, name, end, inst, newaddr, fidx;
    addr = 0;
    fidx = 0;
    
    haddresstranslationlut = CreateArray("AddressTranslationLookupTable");
	if(haddresstranslationlut == -1)
    {
        // If array already exist get the handle by GetArrayId
        haddresstranslationlut = GetArrayId("AddressTranslationLookupTable");
    }
    
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return haddresstranslationlut;
        }
		end  = GetFunctionAttr(addr, FUNCATTR_END);
        inst = addr;
        
        // Get new base address of function
        newaddr = GetArrayElement(AR_LONG, hnewfunctionaddresses, fidx);
        
        SetArrayLong(haddresstranslationlut, inst, newaddr);
        Message("haddresstranslationlut %x \n",haddresstranslationlut);
        inst = FindCode(inst, SEARCH_DOWN | SEARCH_NEXT);
        while(inst < end)
        {
			SetArrayLong(haddresstranslationlut, inst, newaddr + (inst-addr));
			inst = FindCode(inst, SEARCH_DOWN | SEARCH_NEXT);
        }
        fidx = fidx + 1;
    }
    return haddresstranslationlut;
}

static PatchInPlaceDebug(haddresstranslationlut)
{
	auto addr, name, end, inst, newaddr;
    addr = 0;
    
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return;
        }
		end  = GetFunctionAttr(addr, FUNCATTR_END);
        inst = addr;
        
        while(inst < end)
        {
			Message("Address %x mapped to %x\n",inst,GetArrayElement(AR_LONG, haddresstranslationlut, inst));
			inst = FindCode(inst, SEARCH_DOWN | SEARCH_NEXT);
        }
    }
}

static PatchInPlace(haddresstranslationlut)
{
	auto addr, name, end, inst, newaddr, opidx, optype, newrva, nearaddr;
    addr = 0;
    
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return;
        }
		end  = GetFunctionAttr(addr, FUNCATTR_END);
        inst = addr;
        
        while(inst < end)
        {
			opidx = 0;
			optype = GetOpType(inst,opidx);
			while(optype > 0)
			{
				if(optype == 7)
				{
					// Immediate Near Address
					
					// Maybe not necessary but check for call instruction
					if(GetMnem(inst) == "call")
					{
						Message("Instruction at %x being patched.\n", inst);
						nearaddr = LocByName(GetOpnd(inst, opidx));
						newrva   = GetArrayElement(AR_LONG, haddresstranslationlut, nearaddr) - (GetArrayElement(AR_LONG, haddresstranslationlut, inst)+0x6);
						PatchDword(inst+0x1, newrva+0x1);
						if(nearaddr == BADADDR)
						{
							Message("Fatal error, error processing instruction at %x\n", inst);
						}
					}
					else
					{
						Message("Unsupported! Unknown %s instruction needs to be patched.\n", GetMnem(inst));
					}
						
				}
				
				opidx++;
				optype = GetOpType(inst,opidx);
			}  
			inst = FindCode(inst, SEARCH_DOWN | SEARCH_NEXT);        }
    }
}

static EnumerateAndStoreFunctions(hfunctionnames)
{
    auto addr, tmpaddr, name, fidx, widx, bsuccess, tmphandle, inextfunction;
    addr = 0;
    fidx = 0; // function index
    widx = 0; // word idx
    for(addr = NextFunction(addr); addr != BADADDR; addr = NextFunction(addr))
    {
        name = Name(addr);
        
        // Stop if name of function is _pre_cpp_init
        // It is assumed that compiler/linker generated functions
        // are appended in the end of image and that they start with the
        // _pre_cpp_init function
        if(name == "_pre_cpp_init")
        {
            return fidx;
        }
        
	    bsuccess = SetArrayString(hfunctionnames, 2*fidx, name);
	    if(bsuccess == 0)
        {
            Message("Saving name of function %s failed.",name); 
        }
	
        tmphandle = CreateArray(name);
        if(tmphandle == -1)
        {
            tmphandle = GetArrayId(name);
        }

        inextfunction = NextFunction(addr);
        if(inextfunction == BADADDR)
        {
            inextfunction = GetFunctionAttr(addr, FUNCATTR_END);
        }
        
        widx = 0;
        for(tmpaddr = addr; tmpaddr < inextfunction; tmpaddr = tmpaddr + 1)
        {
             SetArrayLong(tmphandle, widx, Byte (tmpaddr));
             widx = widx + 1;
        }
		bsuccess = SetArrayLong(hfunctionnames, 2*fidx+1, widx);
        fidx = fidx + 1;        
    }
    return fidx;
}

static PrintFunctions(hfunctionnames, inumberoffunctions)
{
    auto fidx;
    for(fidx = 0; fidx < inumberoffunctions; fidx = fidx + 1)
    {
        Message("Function: %s\n", GetArrayElement(AR_STR, hfunctionnames, 2*fidx));
    }
}

static WriteBackFunctions(hfunctionnames, inumberoffunctions, iwriteaddr, writetofile, hfile)
{
    auto fidx, oidx, funcname, hopcodes, opcodeslen;
	auto imagebase, virtualsectionoffset, rawsectionoffset;
	auto writeerror, byte, hglobalvars, fileoffset;
	
	if(writetofile == 1)
	{
		hglobalvars          = GetArrayId("GlobalVars");
		imagebase            = GetArrayElement(AR_LONG, hglobalvars, 0);
		virtualsectionoffset = GetArrayElement(AR_LONG, hglobalvars, 1);
		rawsectionoffset     = GetArrayElement(AR_LONG, hglobalvars, 2);
	}
	
	// DEBUG
	Message("imagebase: %x, virtualsectionoffset: %x, rawsectionoffset: %x\n",imagebase,virtualsectionoffset,rawsectionoffset);
	
    for(fidx = 2; fidx >=0 ; fidx = fidx - 1)
    {
        funcname    = GetArrayElement(AR_STR, hfunctionnames, 2*fidx); 
        opcodeslen  = GetArrayElement(AR_LONG, hfunctionnames, 2*fidx+1); 
	    hopcodes    = GetArrayId(funcname);
        for(oidx = 0; oidx < opcodeslen; oidx = oidx + 1)
        {
            byte = GetArrayElement(AR_LONG, hopcodes, oidx);
			PatchByte(iwriteaddr, byte);
			if(writetofile == 1)
			{
			    fileoffset = GetFileOffset(iwriteaddr, imagebase, virtualsectionoffset, rawsectionoffset);
			    writeerror = fseek(hfile, fileoffset, 0);
				writeerror = fputc(byte, hfile);
				if(writeerror == -1)
				{
					Message("Could not write to file (RVA %x)",iwriteaddr);
					return;
				}
				Message("Write byte %x to file offset %x\n", byte, fileoffset);
			}
			
            iwriteaddr = iwriteaddr + 1;
        }
    }
}

static main()
{
	auto hfile, e_lfanew, imagebase, virtualsectionoffset, rawsectionoffset, writetofile, section;
    auto didx, inumberoffunctions, hfunctionnames, hpermutation, hfunctionaddresses, hnewfunctionaddresses;
    auto haddresstranslationlut, hglobalvars, main, call2main, newrva, fileoffset, writeerror;
    
	writetofile              = 1;
	
	// This is init stuff and should be wrapped into a separate init function
	if(writetofile == 1)
	{
		section              = 0;
		hfile                = GetFileHandle("rb");
		e_lfanew             = GetPointerToPEHeader(hfile);
		imagebase            = GetImageBase(hfile, e_lfanew);
		virtualsectionoffset = GetVirtualSectionOffset(hfile, e_lfanew, section);
		rawsectionoffset     = GetRawSectionOffset(hfile, e_lfanew, section);
	    
	    hglobalvars          = CreateArray("GlobalVars");
		if(hglobalvars == -1)
		{
			// If array already exist get the handle by GetArrayId
			hglobalvars = GetArrayId("GlobalVars");
		}
		SetArrayLong(hglobalvars, 0, imagebase);
		SetArrayLong(hglobalvars, 1, virtualsectionoffset);
		SetArrayLong(hglobalvars, 2, rawsectionoffset);

	    // Get address of main
		main                 = LocByName("_main");
		if(main == BADADDR)
		{
			Message("Could not find _main. Aborting...\n");
			return;
		}
		call2main = RfirstB(main);
		if(GetMnem(call2main) != "call")
		{
			Message("Expecting to find call to _main. Unsuccessful. Aborting...\n");
			return;
		}
		
		fclose(hfile);
		hfile                = GetFileHandle("r+");
	}
    
    // Get number of functions 
    inumberoffunctions = GetNumberOfFunctions();
    
    // DEBUG
    // Message("Number of functions %d\n",inumberoffunctions);

	// Create permutation array
	hpermutation = CreatePermutation(inumberoffunctions);
    
    // Get current function addresses
    hfunctionaddresses = GetFunctionAddresses();
    
    // Get addresses after permutation
    hnewfunctionaddresses = GetNewFunctionAddresses(hfunctionaddresses, hpermutation, inumberoffunctions);    

    // Pre-processing, create address translation lookup table
    haddresstranslationlut = CreateAddressTranslationLUT(hnewfunctionaddresses);
    
	PatchInPlace(haddresstranslationlut);

	// Fix call to _main
	if(writetofile == 1)
	{
		if(GetOpType(call2main,0) != 7)
		{
			Message("Unexpected operand found at call2main. Aborting...\n");
			return;
		}
		newrva   = GetArrayElement(AR_LONG, haddresstranslationlut, main) - (call2main+0x6);
		PatchDword(call2main+0x1, newrva+0x1);  

	    fileoffset = GetFileOffset(call2main+1, imagebase, virtualsectionoffset, rawsectionoffset);
	    writeerror = fseek(hfile, fileoffset, 0);
		writeerror = writelong(hfile, newrva+0x1, 0);
		if(writeerror == -1)
		{
			Message("Could not patch call2main (newrva %x)", newrva);
			return;
		}
		Message("Write long %x to file offset %x\n", newrva, fileoffset);
	}
	
	//DEBUG  
    //for(didx = 0; didx<inumberoffunctions; didx++)
    //{
	//	Message("New Function address: %x\n", GetArrayElement(AR_LONG, hnewfunctionaddresses, didx));
    //}
    //return;
    
    // This array is populated with names of functions and
    // the length of the functions in dwords in the following
    // way [name1,length1,name2,length2,...]
    hfunctionnames = CreateArray("FunctionNames");
     
    if(hfunctionnames == -1)
    {
        // If array already exist get the handle by GetArrayId
        Message("hfunctionnames is -1.\n");
        hfunctionnames = GetArrayId("FunctionNames");
    }
 
    //  Enumerate functions and store them i persistent array
    inumberoffunctions = EnumerateAndStoreFunctions(hfunctionnames);	

    // Print functions in IDA's output window
    PrintFunctions(hfunctionnames, inumberoffunctions);
    
	// Write Back functions in reversed order
	WriteBackFunctions(hfunctionnames, inumberoffunctions, 0x401000, writetofile, hfile);
 
	if(writetofile == 1)
	{
		fclose(hfile);
	}
 
	MakeUnknown (MinEA(), MaxEA() - MinEA(), 1);
    AnalyzeArea (MinEA(), MaxEA());      
}
User avatar
dELTA
Posts: 4209
Joined: Mon Oct 30, 2000 7:00 am
Location: Ring -1

Post by dELTA »

Very nice work niaren! :yay:

Next steps would be to update all locations that IDA classifies as addresses (if the executable has a relocation table, they should all be in there though, so in that case you already got the jackpot, but the relocs might be stripped from executables (contrary to DLLs) and this would make the script useless on them [first protector counter-measure, woo]), and then the little more complicated inter-functional offsets (note: offsets != addresses). If relocating code on function-level rather that object file-level, you might even have to mutate the code in more complex ways than just patching addresses in order to fix these inter-functional offsets (since the new relocated offset might need more bits space than the original one needed), but if not, you can ignore them completely I would think, since there would not be any in the object file-level case.

After this, I guess tests on more and more complex executables is the way to go, until they possibly crash after your dewatermarking, and then analyze their crash/disassembly in IDA to see what kind of special case cause the script not to work, then implement support for this special case, and then iterate the procedure until the dewatermarking code shuffling produces a working IDA executable. ;)

After that, in order to make it a serious "generic dewatermarker", my suggested steps are probably these (as loosely mentioned in my previous posts too):
  • Import table shuffler (should be easy as long as all code that is statically linked to imports is correctly analyzed by IDA)
  • Export table shuffler (should be easy no matter what)
  • PE resource directory shuffler (should be easy under normal conditions, I think)
  • Relocation table shuffler (this might be covered by what you already mention is your planned next step, but I cannot bring myself to remember if the contents of the relocation table can have arbitrary order, or if they must be ordered by relocated address - in the former case you should always randomly reshuffle the order of the relocations too, to eradicate any watermark entropy that might be hidden in this ordering).
  • Code-location-independent function diffing tool (checking for any differences within functions that are not related to their location (and thus neglecting differences relating to call and jump addresses/offsets in the code, but detecting all other differences), e.g. to see if there are any differences in used instructions in sub areas of functions etc. Do note that not all addresses/offsets can/should be ignored during this process though, only those related to jumps/calls in the code, since otherwise entropy can be hidden in e.g. the ordering of data in data sections!
  • Data area diffing tool (detecting differences in default data section contents).
  • Non PE-section data diffing tool (diffing content of non-PE-section parts of executable files, e.g. PE headers, code caves or data inserted between, before or after PE section areas in the executable.
For any detected differences in code sections, you must mutate the affected functions with a code obfuscation algorithm to hope to remove any watermarking entropy hidden in their original implementation. This algorithm should at least obfuscate/morph instruction ordering and substitute instructions or instruction sequences with semantically equivalent instructions or instructions sequences. Only "changing their CRC" with simple antivirus evasion-style obfuscation (insertion of nops, xor decryption layer etc) won't remove much entropy from the possibility of recovery by manual analysis.

All code differences must also be analyzed manually by the reverser running the script/differ though, since a final resort of the protector might be to generate semantically different code in each watermarked version (e.g. setting a register to a serial number in some stray instruction somewhere), which would then not be removed by the above code obfuscation techniques. It could then be concluded by the reverser to be superfluous to proper program operation though, and thus completely nopped out instead.

If you follow this advice in your implementation, you'll have a pretty damn capable (and unique) generic dewatermarker tool in you hands I'd say. :yay:
"Give a man a quote from the FAQ, and he'll ignore it. Print the FAQ, shove it up his ass, kick him in the balls, DDoS his ass and kick/ban him, and the point usually gets through eventually."
Locked