Just a short post: I won - finally! Rather than re-posting the whole answer I'll provide a link to it: Link. I won't be claiming the book, because I have rather bad experiences with the Romanian postal service, but it's nice to be recognized.
Two ideas which came after I submitted my answer:
It is possible to generate a dictionary to get from any CRC value to any CRC value using just 7-bit safe characters. One would do it the following way:
- Designate a 32 bit value as the
meeting ground. For example 0xFABCDE12
- Now start to generate all possible strings from the character set you have. Use the generated strings to go both forward and backward from the
meeting groundand record the results in a table (probably stored on the disk, due to the large volume of data needed). The idea is to calculate CRC32([meeting ground], [string]) and CRC32_Reverse([meeting ground], [string]) (which equals the value for which CRC32([value], [string]) == [meeting ground]). Both of these operations can be performed in linear time with respect to the string length as described in the linked paper, which means almost instantly in the case of such short string.
- Now when we want to go from one string to a given CRC value, we perform two lookups: the string which is needed to go from the initial CRC value to the
meeting pointand the second which is needed to get from the
meeting pointto the initial CRC. Concatenating these two strings to the modified string will result in an arbitrary (desired) CRC using only characters only from the selected alphabet (7-bit clean in this example).
The second thought: when choosing hash functions don't use error correction codes and vice-versa. Known the intended usage of the algorithm you decide to implement. Also, the possible value space of the the output should give you a rough idea on how likely it is that you will ave collisions. For example for CRC32 the output space is 2^32. This means that if you have more than 2^32 distinct output data (which is entirely possible these days), you are guaranteed to have a collision. Something like MD5 has in change an output space of 2^128, which is very unlikely to be insufficient.