PDA

View Full Version : Checksum theory check...sum


Silver
October 31st, 2005, 06:58
Hi guys,

I believe I know the answer to this question, but I'd like to run it by the more experienced people here in case I've missed something.

Assuming an app is protected against quick modification by a checksum system (say, SHA1/ MD5). The app is composed of one executable and one DLL. The executable loads the DLL through late binding (LoadLibrary() etc).

To implement a checksum in this situation, the first instinct would be to generate a SHA1 of the exe and a SHA1 of the DLL. Insert the exe's SHA1 output into the DLL, then at runtime have a function in the DLL create another run-time SHA1 of the exe and compare the 2 hashes. Next insert the SHA1 of the DLL into the exe, and at runtime have a function in the exe create another SHA1 of the DLL, then compare the 2 hashes.

However as far as I'm aware this is impossible, because you have a chicken and egg situation. You can insert the SHA1 of the exe into the DLL, but as soon as you insert the SHA1 of the DLL into the exe you've changed the SHA1 of the exe, thus making the SHA1 stored in the DLL invalid. This problem is apparently circular.

My question is: is there a formula, system or similar that allows you to do this? Obviously you can exclude the section of the exe & DLL that contains the cross-checked hashes from the overall hash, or store the hashes outside of the exe/dll, but the assumption here is that you want to perform a hash on the entire .exe and .dll using only code and resource inside them.

Thanks!

Maximus
October 31st, 2005, 11:41
...the only way I could think is:
use MD5 for both, then *hope* you can find a collision in time, and patch exe or dll to keep signature valid...
...not very useful, I know...

Extremist
October 31st, 2005, 13:05
Special hash functions and extra "corrector" data to set the hashes correctly despite circular checking:

Bill Horne, Lesley Matheson, Casey Sheehan, Robert E. Tarjan, "Dynamic Self-Checking Techniques for Improved Tamper Resistance"

http://link.springer.de/link/service/series/0558/bibs/2320/23200141.htm

nikolatesla20
October 31st, 2005, 16:14
For one thing, you have to make sure the MD5 is only done on the code section. Then I think you would be fine.

Now, this seems like a problem but really you must remember to only hash the code section. Since the hashes will go into the data section then adding a hash value will not affect the hash itself of the EXE or DLL. You could also hash code+resource I suppose and get away with it some other way. Just make sure you store the hash somewhere where it isn't included in the hash itself.

Another reason to hash only certain sections is because at runtime some sections will be completely different than their file counterparts. Remember IAT tables, and DLL's have reloc's too. You really don't want to hash those.

-nt20

Extremist
October 31st, 2005, 19:16
Hashing the hashes is the interesting part.

LLXX
October 31st, 2005, 21:31
Interesting problem. The most interesting solution would be to append extra data to both of them so that both the EXE and the DLL have the same (!) hash. Then either could hash itself and then hash the other and compare the two results. Of course, this is only applicable to MD5 where collisions are relatively easy to find, since you'd need the colliding set.

For reference, you may want to search for the following two files on the Internet:
md5_someday.pdf
md5-attack.pdf
which address how collisions can be easily found in MD5.

This circular checking is definitely an interesting idea. I've never seen it in an actual implementation though...

naides
October 31st, 2005, 21:55
This remainds me of an old proposition in Philosophy:

Can the human mind (or a computer) completely comprehend itself?
Some people argue that even if you undertood each and every process, thought, perception, you will still be short becuse you have to understand how the mind understands itself, but when you do, you are still short, you have yet another thought, the process of comprehension of the comprehension of the process to comprehend, and so on.

Cheap cocktail party philosophy, I know

More to the problem that Silver proposed. A hash function in which you could 'correct', predict or compensate the effect of introducing a change in the hashed file i.e. the newly calculated hash value, would esentially defeat the security purpose of the hash: You could intraduce ANY alteration you wish in the file, then you "compensate" to mantain the desired hash value.

Silver
November 1st, 2005, 08:08
Quote:
Another reason to hash only certain sections is because at runtime some sections will be completely different than their file counterparts. Remember IAT tables, and DLL's have reloc's too. You really don't want to hash those.


Sure, I understand that would be an "operational" problem. What I'm trying to get at is the theory behind it. 2 data files, each containing a hash of each other, able to confirm the hashes when the data is access (or the exe runs, or whatever).

My original thinking was, if you have 2 hashes that cross-check each other, you can never make a modification to the data they hash as you'll always be stuck with one hash out of sync. I can "kind of" see a principle similar to asymmetric encryption, where there is a private component to the hashes and a public component. The private component is stored and kept safe, the public component is freely distributed with the protected data.

Quote:
A hash function in which you could 'correct', predict or compensate the effect of introducing a change in the hashed file i.e. the newly calculated hash value, would esentially defeat the security purpose of the hash: You could intraduce ANY alteration you wish in the file


Naides, good point. This would be a definite issue, and would affect using a flaw in a system (such as MD5 collisions that LLXX mentioned).

Going back to my reference to asymmetric encryption. You need 3 keys to decrypt asymmetric encrypted data, and 4 keys to "respond" to it. One key must be a private key, 2 must be public keys.

I was thinking about how to substitute this system with hashes. I don't want to use the phrase "key" here as it implies too much. I'll call it a widget instead. Example: we have datafile1 and datafile2. We use the master widget and a public widget to create a hash of datafile1. We then use the master widget and public widget2 to create a hash of datafile2. We then keep the master widget safe and never distribute it. We include private widget1 and the hash of datafile1 inside datafile2. We include private widget2 and the hash of datafile2 inside datafile1.

The data is then released to the public. Now the hash in datafile1 can be de-widget-ed and used to check datafile2 hasn't been modified. The hash in datafile2 can also be de-widget-ed and used to check datafile1 hasn't been modified.

The hashes can never be altered or recreated, because the master private widget is never publicly distributed. So as far as I can see, in theory this prevents naides (very valid) issue with self-compensating hashes.

It does still leave the issue of how to get the hashes in the first place without going round in circles though.

Thoughts?

nikolatesla20
November 1st, 2005, 08:25
This "concept" is what is behind digital signatures. It's a asymetric crypto applied to a hash of a file. I don't see the point of trying to hash a hash this way of having the hash in the file itself - even in digital signatures the hash is not considered part of the file (it's in the file, but it's not part of the hashed content). Also in your implementation you have the widget and public key encrypt the hash of file 1, and then give that to file 2. So really that's the same thing as a digital signature, except another file contains the encrypted hash other than the file whose signature is being checked. It's still the same thing then as a digital signature.

-nt20

CoDe_InSiDe
November 1st, 2005, 10:07
A quick and maybe stupid reply

Can't you just allocate some space and store it there?
The .EXE allocates some memory and stores the pointer in the data section, then calls the .DLL which also allocates some memory and also stores the pointer in the DLL's data section.
Then the .DLL calculates the hash of the .EXE and stores it in the DLL_Memory.
And then the .EXE calculates the hash of the .DLL and stores it in the EXE_Memory.

After that the .EXE or .DLL calculates the hash from the other and compares it to the EXE_Memory (When the .EXE calculates the hash of the .DLL) or DLL_Memory (When the .DLL calculates the hash of the .EXE).

Maybe i'm wrong here somewhere but this sounds to me like the most simple solution

dELTA
November 2nd, 2005, 06:46
Quote:
Maybe i'm wrong here somewhere but this sounds to me like the most simple solution
Unfortunately, this misses the entire point of using checksums for modification checking, since the "reference checksum" will be re-calculated from the current state of the file each time, and hence, no pre-made changes to the files will be detected.

And Silver, I'd checksum only the entrypoint (well, entire PE header if you prefer) and the code section that the entry point is located in. Defeating this check is equally hard to removing the checksum protection altogether, and since a cracker will always choose the easiest way there is nothing to be won by making the theory behind the checksumming harder than the patching/disabling of it.

Also, the distinction of runtime checksums and checksums of the static files are quite important, and there seems to be some inconsistencies about which of these two people are talking about in this thread. Some different limitations apply to them both.

Silver
November 2nd, 2005, 07:23
Quote:
This "concept" is what is behind digital signatures. It's a asymetric crypto applied to a hash of a file.


Heh, I was mulling this over yesterday and came to the conclusion that someone was going to say what you said. my fault, let me explain again.

Let's talk about what I proposed in the context of asymmetric crypto. There are 3 keys needed for this operation. 2 private keys and one public key. Private key 1 and public key 1 are used to encrypt the data, and private key 2 with public key 1 are used to decrypt it.

Everyone without exception must have access to this data, whatever it is. Maybe it's a nag screen protected in an exe, or maybe it's some important document. The exercise is to protect the delivery of the data. The creator of the data encrypts it with his private key #1 and the public key. The creator then sends the encrypted data out to the wide world along with private key #2 and the public key.

At this point, anyone who downloads the data also gets private key #2 and the public key (in the same zip file or whatever). Therefore anyone who downloads the data can decrypt it. They CANNOT decrypt the data, modify it and then encrypt it again, because they don't have private key #1. So no matter what, the only person that can modify the data is the creator who owns the private key.

Now the data owner takes another step, and creates an MD5 checksum of the data before sending it out. This checksum is also encrypted with the same private key #1 and public key. This is what nikolatesla20 was referring to.

That's basic theory. Now when the creator applied asym crypto to the hashes, what happens? Really there's absolutely no difference. The hashes are not linked to the crypto in any way, they just have to be decrypted and checked.

So if we base a protection on this, what can happen? It's easy to crack. The cracker gets the exe and the publicly distributed private key #2 and public key. He runs the exe, it decrypts itself, he procdumps it then inserts the in-mem decrypted code over the top of the encrypted code in the on-disk exe. He can then create his own checksum of the on-disk exe. Now when the exe runs, it checks the checksum (which the cracker has just modified) and confirms it's valid. The cracker has easily and quickly made the checksum system redundant.

Now consider if we made the hashing an integral function of the encryption process. The problem that naides raised is a valid one - if you have self-correcting checksums, they will simply self-correct themselves right round a crack thus defeating the security mechanism. So what you need to do is to prevent them from correcting themselves after the protection has been applied.

Forget about asymmetric crypto for a second and consider asymmetric hashes. 2 hashes, created by the use of a third, private widget. The 2 hashes cross-check each other but cannot be modified or recreated without the private widget.

If this system was created it would be exceptionally difficult to defeat. Going back to my original DLL & EXE cross-check. You can't modify the exe, because there is a hash in the DLL that checks it. You can't modify the DLL, because there's a hash in the EXE that checks it. You can't modify (ie: create your own from your patched exe/dll) the hashes because they are a function of the widget system, and the private widget is needed to create them. This requirement of the private widget specifically for the hashes is what differentiates my thoughts from simply asymmetric encrypting the hashes then decrypting them (which requires that they be in a "vanilla" state after decryption).

Yes, it's the same principle as asymmetric crypto but with an extra step. If there is a way to create cross-checking hashes as per my original question, then this is definitely possible.

Hopefully I've explained that clearly....

nikolatesla20
November 2nd, 2005, 12:33
But since the end user needs the public key to decrypt the hashes, the cracker could just replace the public key with his own, and then create a hash and encrypt it with his private key, and put that in the file, after he modded it. It's just never secure I guess once it's on a user's system.

-nt20

LLXX
November 3rd, 2005, 00:28
Quote:
[Originally Posted by nikolatesla20]It's just never secure I guess once it's on a user's system.

Correct. That's what makes protections always crackable (unless the program itself does not contain the necessary processing code, and just communicates input/output with a central server). Once software in any form, source or binary, is released, the protection is essentially defeated.

Silver
November 3rd, 2005, 09:07
Quote:
But since the end user needs the public key to decrypt the hashes, the cracker could just replace the public key with his own, and then create a hash and encrypt it with his private key, and put that in the file, after he modded it. It's just never secure I guess once it's on a user's system.


Yes, but think that through for a second. The cracker would have to make every single patch to the exe in one hit.

The system we're talking about prevents any modification to either the exe or the dll because the cross-checking hashes would flag as out-of-synch.

So walking this through for a second. I start up the exe and kick sice into action. I break on what I think is the start of the code that gives all this trouble with cross checking hashes. Okay, I found a jne I want to patch to a jmp. If I patch it right now and resume execution, the next hash check is going to fail.

I would have to patch every single piece of code in the exe and dll in one single go to avoid a hash crosscheck failing.

Yes, your statement is absolutely correct - the cracker can replace the public key with their own pub key(/widget) and re-widgetise the entire thing with their private key(/widget). But the fact still remains that they *cannot* insert the keys or hashes into the exe or dll whilst the existing cross-check remains. They must remove all of the cross-checking protection in one go before doing this (but at that point, it's easier just to distribute the vanilla exe with the checks removed).

It's not 100% infallible, nothing is. Maybe I'm missing an essential principle here, and I know this is based on the assumption that you can generate cross-checking hashes. But IMO it is an incredibly difficult protection to defeat, the cracker would have to be skilled enough to trace the entire code in a deadlist, work out all the modifications by hand and apply them all at once. Even just a little obfuscated code in the original exe would make this a nightmare...

mike
November 23rd, 2005, 12:06
There are a few papers out there about self-decrypting code and hash checking. One paper proposed writing an interrupt handler to catch any read accesses to the program and redirect them to an unmodified version. Then you can tweak the program to your heart's content and the program can't tell that it changed. (Of course, there are ways to beat it, but it beats a large class of self-checkers.)

You can find MD5 collisions easily, and thus have a pair of programs each containing the other's hash, but so can an attacker, so you might as well use CRC32.