Back to Top

Friday, January 04, 2008

Faster MD5 collisions

MD5 is looking less and less reliable as the day pass. It seems that now researchers have been able to create an attack that can append a couple thousand bytes to two arbitrarily chosen files which would result in those files having the same MD5 hash, and compute these byte sequences with consumer grade hardware relatively fast (a few days). Via the Symantec blog. However I think that the final statement in the blog post is not correct:

Multiple checksums? Useless if the second algorithm is weaker than the first one (MD5 and CRC32, for example). If the second algorithm is strong, then it is sufficient on its own and the MD5 calculation becomes a waste of time.

This statement would be correct (in my humble opinion) only if the given hashes have the same algorithmic background (for example the MD and SHA family of hashes). However, if you use two hashes which are from two different families (for example one from the SHA-2 family - SHA-224, SHA-256, etc - and WHIRPOOL) odds are that if an attack if found against one, the collisions it generates are not effective against the other and vice-versa. Of course mathematically speaking there will always be collisions, since it isn't possible to create a one-to-one mapping between an larger (possible infinite) set (all possible sequence of bits to be hashed) and a smaller set (the possible hash values), but from a practical point of view we are interested only if we can create these collisions in useful time with chosen starting points.


  1. i kinda don't understand why people keep looking at md5... as i understood it, use of md5 in new systems has been deprecated since 1995...

    i recall a pre-2000 post to sci.crypt detailing an example of 2 files that differed by 1 bit but gave the same md5 value - i believe it was referred to as a pseudo-collision...

  2. One reason why people use MD5 is backwards compatibility. Many communications use MD5 to identify a particular piece of data (take a look at any malware related posting or VirusTotal for that matter).

    An other reason is premature optimization: some people still think that the fact that MD5 is a little faster to compute and requires a little less storage space than SHA-256 for example is relevant.