View Full Version : Collision vulnerabilities in MD5 Checksums

December 7th, 2004, 08:34
Collision vulnerabilities in MD5 Checksums - It is possible to create different executables which have the same md5 hash. The attacks remain limited, for now. The attack allows blocks in the checksumm'd file to be swapped out for other blocks without changing the final hash. This is an excellent vector for malicious developers to get unsafe code past a group of auditors, perhaps to acquire a required third party signature. Alternatively, build tools themselves could be compromised to embed safe versions of dangerous payloads in each build.


hxxp://www.astalavista.com//data/stripwire1.1.tar.gz (in perl )

December 7th, 2004, 17:49
Except that everyone knows MD5 is hosed. No one should be using it to verify the authenticity of anything.

December 8th, 2004, 12:25
hosiminh, MD5 collisions ("birthday attack" have been around for a very long time. The conceptual problem with them is that whilst you can find a "bad" string that produces a collision with the "good" string, actually making anything useful from the bad string is exceptionally difficult. SHA-1 is a better choice.

December 8th, 2004, 13:25
Um, Silver- - - You actually have to belive in the message of Christmas, not the commercial hype which surrounds it, and if one does, no one could actually steal it from them.

And who'd want to steal that lump of coal you are going to get anyway?


December 9th, 2004, 04:56

Even SHA-1 can be solved . Look for example hxxp://www.xbox-linux.org/docs/friday13th.html .
Some more stuff : searching for "Crypto 2004+sha-1" (without quotes) will bring up interesting results (Eli Biham @t work).

December 9th, 2004, 11:15
hosiminh, if I'm reading correctly SHA-1 hasn't been broken yet, but has been conceptually/theoretically shown to be vulnerable (ie: "matter of time".

JMI, that lump of coal keeps me warm. Hands off

December 13th, 2004, 10:51
Given all the FUD about MD5, I'd like to clear up a few facts:

1) The stripwire attacks are of limited use as they depend on the original
input vector for MD5 (as the published collisions use original input vector).
This implies that two files must have the collisions as first string in the
file, which makes the published collisions useless for backdooring w/o a
dedicated external file loader (which will raise all alarms).
The assertion made in the paper that collisions with different IV's can be
generated rings true, so in general it would be possible to create two
executables with identical MD5sum bit different behaviour. The impact for
security is IMO miniscule, as both executables need to be identical except
for the colliding block -- so any (even cursory) analysis of either executable
will raise all alarms (a self-modifying executable by definition can't be
trusted, a non-self modifying executable will carry all "evil" code bits in it).

We also have to keep in mind that the difference between colliding blocks
cannot be at arbitrary places in the colliding block -- the collisions need
the changes to occur in such a manner that the differences within the MD5
calculation stay in the MSB as to limit the avalanche effect of addition mod

2) MD5 for integrity checking is currently unproblematic. The degrees of
freedom you
have when generating an MD5 collision are severely limited, and second-
preimage attacks for MD5 aren't going to happen unless a major breakthrough
occurs. It is important to keep in mind that second preimage has _not_ been
solved even for the "simple" MD4. This doesn't mean you shouldn't replace
MD5, but no need for panicking. I challenge anyone to come up with a way
that given executable A he can create executable A' with the same MD5sum.
(the change doesn't have to leave the executable executable, but touching
executable A is not allowed) :-)

3) Trust in SHA-1 should not be placed significantly higher than in MD5.
In many places, the design of SHA-0 was extremely simplistic and did not
take into account lessons learnt from the MD4/MD5 attacks Dobbertin did.
SHA-0's message expansion is completely linear -- why ? The changes
between SHA-0 and SHA-1 are miniscule (a 1-bit rotation only), and some
promising attacks have been made either by linearisation of the addition in
SHA or by using OBDD-like structures for solving equation systems.
In general SHA-1 looks to have been designed with a small "security margin".

Something to always keep in mind: NSA can buy the worlds biggest
computers -- they thus have no interest in building a backdoored crypto algo,
but they do have an interest in limiting the complexity of an attack on a
crypto algo to less than 2^80 or so -- e.g. outside the reach of any
other government.

Halvar :-)

December 14th, 2004, 17:38
Given two semantically equivalent instructions, you can embed a bit of information: add x,y means 0, sub -x,y means 1. Given 160 bits of channel in two exes, one good, one bad, you can use van Oorschot's method (see birthday attack in wikipedia for a link) to produce a pair of exes semantically equivalent to the first but with the same hash. You can then submit the good one for signing.

Finding a preimage is far harder than finding a collision.