begood Posted December 16, 2009 Report Posted December 16, 2009 The MD5 algorithm is quite possibly the most widely used digest algorithm out there. So of course, being the geek you are, you want to know how it works. Read on.First you must think of your message, not as a sequence of bytes, but as a sequence of bits (but only for a short while). The MD5 algorithm will accept messages that are any arbitrary number of bits long. However, so that the algorithm can process the data, it begins by padding the message to a length it can handle. This length just happens to be any number such that length mod 512 is equal to 448, or 64 bits short of being a multiple of 512. The message is padded by first appending a 1 bit to the message, and then enough 0 bits to make the message the proper length. The 1 bit is always added, so even if the message is already the proper length, it will be padded (a message can be padded with anywhere from 1 to 512 bits).The next step is to calculate the length of the message (before padding). This number is then appended as the last 64 bits of the message, making the message length a multiple of 512. If the message happens to be greater than 2^64 bits long, then the least significant 64 bits of the message are used (length mod 2^64).The MD5 algorithm uses 4 state variables, each of which is a 32 bit integer (an unsigned long on most systems). These variables are sliced and diced and are (eventually) the message digest. The variables are initialized as follows:A = 0x67452301B = 0xEFCDAB89C = 0x98BADCFED = 0x10325476Now on to the actual meat of the algorithm: the main part of the algorithm uses four functions to thoroughly goober the above state variables. Those functions are as follows:F(X,Y,Z) = (X & Y) | (~(X) & Z)G(X,Y,Z) = (X & Z) | (Y & ~(Z))H(X,Y,Z) = X ^ Y ^ ZI(X,Y,Z) = Y ^ (X | ~(Z))Where &, |, ^, and ~ are the bit-wise AND, OR, XOR, and NOT operators (respectively) that all C programmers should be familiar with.These functions, using the state variables and the message as input, are used to transform the state variables from their initial state into what will become the message digest. For each 512 bits of the message, the following is performed (this is only pseudo-code, don’t try to compile it):/* Group the 512 bit message into 16 different 32 bit chunks */For j = 0 to 15 do Set X[j] to MessageBits[j*32] through MessageBits[j*32 + 31]end/* Store the digest variables out of harms way for the time being */AA = ABB = BCC = CDD = D/* Round 1. *//* Let [abcd k s i] denote the operation: a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s). *//* Do the following 16 operations. */ [ABCD 0 7 1] [DABC 1 12 2] [CDAB 2 17 3] [BCDA 3 22 4] [ABCD 4 7 5] [DABC 5 12 6] [CDAB 6 17 7] [BCDA 7 22 8] [ABCD 8 7 9] [DABC 9 12 10] [CDAB 10 17 11] [BCDA 11 22 12] [ABCD 12 7 13] [DABC 13 12 14] [CDAB 14 17 15] [BCDA 15 22 16]/* Round 2. *//* Let [abcd k s i] denote the operation: a = b + ((a + G(b,c,d) + X[k] + T[i]) <<< s). *//* Do the following 16 operations. */ [ABCD 1 5 17] [DABC 6 9 18] [CDAB 11 14 19] [BCDA 0 20 20] [ABCD 5 5 21] [DABC 10 9 22] [CDAB 15 14 23] [BCDA 4 20 24] [ABCD 9 5 25] [DABC 14 9 26] [CDAB 3 14 27] [BCDA 8 20 28] [ABCD 13 5 29] [DABC 2 9 30] [CDAB 7 14 31] [BCDA 12 20 32]/* Round 3. *//* Let [abcd k s t] denote the operation a = b + ((a + H(b,c,d) + X[k] + T[i]) <<< s). *//* Do the following 16 operations. */ [ABCD 5 4 33] [DABC 8 11 34] [CDAB 11 16 35] [BCDA 14 23 36] [ABCD 1 4 37] [DABC 4 11 38] [CDAB 7 16 39] [BCDA 10 23 40] [ABCD 13 4 41] [DABC 0 11 42] [CDAB 3 16 43] [BCDA 6 23 44] [ABCD 9 4 45] [DABC 12 11 46] [CDAB 15 16 47] [BCDA 2 23 48]/* Round 4. *//* Let [abcd k s t] denote the operation a = b + ((a + I(b,c,d) + X[k] + T[i]) <<< s). *//* Do the following 16 operations. */ [ABCD 0 6 49] [DABC 7 10 50] [CDAB 14 15 51] [BCDA 5 21 52] [ABCD 12 6 53] [DABC 3 10 54] [CDAB 10 15 55] [BCDA 1 21 56] [ABCD 8 6 57] [DABC 15 10 58] [CDAB 6 15 59] [BCDA 13 21 60] [ABCD 4 6 61] [DABC 11 10 62] [CDAB 2 15 63] [BCDA 9 21 64]/* And finally update the state variables */A+=AAB+=BBC+=CCD+=DD(If you are confused on what ‘<<<’ does, let it be known that it represents a bit-wise rotation to the left, e.g. 110011 <<< 2 = 001111)After this step, the message digest is stored in the state variables (A, B, C, and D). To get it into the hexadecimal form you are used to seeing, output the hex values of each the state variables, least significant byte first. For example, if after the digest:A = 0x01234567;B = 0x89ABCDEF;C = 0x1337D00DD = 0xA5510101Then the message digest would be:67452301EFCDAB890DD03713010151A5And that’s how you calculate MD5 digests of, well, of whatever you want to calculate MD5 digests of.Additional information on the MD5 algorithm can be found in the MD5 RFC (#1321), which I used as my one and only reference in writing this article. In fact, my article’s sole purpose seems to be in slightly simplifying the RFC (that and to serve as a way to get my MD5 class out to the world...). Anyway, RFC #1321 can be found at RFC 1321 (rfc1321) - The MD5 Message-Digest Algorithm.Accompanying code for my article can be obtained by sending me an e-mail or pm asking for it (nicely). Although if it is possible, it would probably be easier on everyone if the source was hosted on the site somewhere. This code is also largely copied from the RFC on MD5, although I encapsulated it in a class and cleaned it up slightly for readability reasons. I also added a few comments on some of the stuff in the header, just to clarify the purpose. The class has been tested and is RFC compatible. I also included a small example program (code only) to demonstrate how to use the class. I hold no responsibility for anything you do with this program or for anything this program does to you. The code was created and compiled successfully using MSVC++ 6, for which I have included a project file and workspace. For the rest of you, some tweaking may be necessary for it to work. http://www.osix.net/modules/article/?id=507 Quote