Modern Cryptography: Cryptography Hashes

Modern Cryptography: Cryptography Hashes

Message Digests

Overview • Cryptographic hash functions are functions that: • Map an arbitrary-length (but finite) input to a fixed-size output. • Are one-way (hard to invert). • Are collision-resistant (difficult to find two values that produce the same output). • Examples: • Message digest functions - protect the integrity of data by creating a fingerprint of a digital document. • Message Authentication Codes (MAC) - protect both the integrity and authenticity of data by creating a fingerprint based on both the digital document and a secret key.

Checksums vs. Message Digests • Checksums: • Used to produce a compact representation of a message. • If the message changes the checksum will probably not match. • Good: accidental changes to a message can be detected. • Bad: easy to purposely alter a message without changing the checksum. • Message digests: • Used to produce a compact representation (called the fingerprint or digest) of a message. • If the message changes the digest will probably not match. • Good: accidental changes to a message can be detected. • Good: difficult to alter a message without changing the digest.

Hash Functions • Message digest functions are hash functions: • A hash function, H(M)=h, takes an arbitrary-length input, M, and produces a fixed-length output, h. • Example hash function: • H = sum all the letters of an input word modulo 26. • Input: a word. • Output: a number between 0 and 25, inclusive. • Example: • H(“Elvis”) = ((‘E’ + ‘L’ + ‘V’ + ‘I’ + ‘S’) mod 26) • H(“Elvis”) = ((5+12+22+9+19) mod 26) • H(“Elvis”) = (67 mod 26) • H(“Elvis”) = 15

Collisions • For the hash function: • H = sum all the letters of an input word modulo 26. • There are more inputs (words) than possible outputs (numbers 0-25). • Some different inputs produce the same output. • A collision occurs when two different inputs produce the same output: • The values x and y are not the same, but H(x) and H(y) are the same.

Collision-Resistant Hash Functions • Hash functions for which it is difficult to find collisions are called collision-resistant. • A collision-resistant hash function, H(M)=h: • For any message, M1, it is difficult to find another message, M2 such that: • M1 and M2 are not the same. • H(M1) and H(M2) are the same.

One-Way Hash Functions • A function, H(M)=h, is one-way if: • Forward direction: given M it is easy to compute h. • Backward direction: given h it is difficult to compute M. • A one-way hash function: • Easy to compute the hash for a given message. • Hard to determine what message produced a given hash value.

Message Digest Functions Message digest functions are collision-resistant, one-way hash functions: • Given a message it is easy to compute its digest. • Hard to find any message that produces a given digest (one-way). • Hard to find any two messages that have the same digest (collision-resistant).

Using Message Digest Functions Message digest functions can be used to ascertain data integrity: • A company makes some software available for download over the World Wide Web. • Users want to be sure that they receive a copy that has not been tampered with. • Solution: • The company creates a message digest for its software. • The digest is transmitted (securely) to users. • Users compute their own digest for the software they receive. • If the digests match the software probably has not been altered.

The Secure Hash Algorithm (SHA) • A Federal Information Processing Standard (FIPS 180-1) adopted by the U.S. government in 1995. • Based on a message digest function called MD4 created by Ron Rivest. • Developed by NIST and the NSA. • Input: a message of b bits. • Output: a 160-bit message digest.

SHA - Padding • Input: a message of b bits • Padding makes the message length a multiple of 512 bits. • The input is always padded (even if its length is already a multiple of 512). • Padding is accomplished by appending to the input: • A single bit, 1, • Enough additional bits, all 0, to make the final 512-bit block exactly 448 bits long, • A 64-bit integer representing the length of the original message in bits.

SHA – Padding Example • Consider the following message: • M = 01100010 11001010 1001 (20 bits) • To pad we append: • 1 (1 bit), • 427 0s (because 448-21 = 427 bits), • 64-bit binary representation of the number 20 (64 bits). • Result: • Pad(M) = 01100010 11001010 10011000 00000000 . . . 00000000 00010100 (512 bits). • 464 0s have been omitted above (denoted by the ellipsis).

SHA – Constant Initialization After padding, constants are initialized to the following hexadecimal values: • Five 32-bit words: • H0= 67452301 • H1= EFCDAB89 • H2= 98BADCFE • H3= 10325476 • H4= C3D2E1F0 • Eighty 32-bit words: • K0– K19= 5A827999 • K20 – K39= 6ED9EBA1 • K40 – K59= 8F1BBCDC • K60– K79= CA62C1D6

SHA – Step 1 • The padded message contains a whole number of 512-bit blocks, denoted B1, B2, B3, . . ., Bn • Each 512-bit block, Bi, of the padded message is processed in turn: • Bi is divided into 16 32-bit words, W0, W1, . . ., W15 • W0 is composed of the leftmost 32 bits in Bi • W1 is composed of the second 32 bits in Bi … • W15 is composed of the rightmost 32 bits in Bi

SHA – Step 2 • W0, W1, . . ., W15 are used to compute 64 new 32-bit words (W16, W17, . . ., W79) • Wj (16 <j < 79) is computed by: • XORing words Wj-3, Wj-8, Wj-14, and Wj-16 together • Circularly left shifting the result one bit for j = 16 to 79 do Wj= Circular_Left_Shift_1(Wj-3Wj-8Wj-14Wj-16) done

SHA – Step 3 • The values of H0, H1, H2, H3, and H4are copiedinto five words called A, B, C, D, and E: • A = H0 • B = H1 • C = H2 • D = H3 • E = H4

SHA – Step 4 • Four functions are defined as follows: • For (0 <j < 19): • fj(B,C,D) = (B AND C) OR ((NOT B) AND D) • For (20 <j < 39): • fj(B,C,D) = (B C D) • For (40 <j < 59): • fj(B,C,D) = ((B AND C ) OR (B AND D) OR (C AND D)) • For (60 <j < 79): • fj(B,C,D) = (B C D)

SHA – Step 4 (cont) • For each of the 80 words, W0, W1, . . ., W79, a 32-bit word called TEMP is computed • The values of the words A, B, C, D, and E are updated as shown below: for j = 0 to 79 do TEMP = Circular_Left_Shift_5(A) + fj(B,C,D) + E + Wj+ Kj E = D; D = C; C = Circular_Left_Shift_30(B); B = A; A = TEMP done

SHA – Step 5 • The values of H0, H1, H2, H3, and H4, are updated: • H0= H0+ A • H1= H1+ B • H2= H2+ C • H3= H3+ D • H4= H4+ E

SHA - Summary • Pad the message • Initialize constants • For each 512-bit block (B1, B2, B3, . . ., Bn): • Divide Bi into 16 32-bit words (W0– W15) • Compute 64 new 32-bit words (W16, W17, . . ., W79) • Copy H0 -H4 into A, B, C, D, and E • For each Wj (W0– W79) compute TEMP and update A-E • Update H0 - H4 • The 160-bit message digest is: H0 H1 H2 H3 H4

Message Digests are not enough… • Example: We want to use a message digest function to protect files on our computer from intruders: • Calculate digests for important files and store them in a table. • Recompute and check from time to time to verify that the files have not been modified. • Good: if someone modifies a file the change will be detected since the digest of that file will be different. • Bad: the attacker could just compute new digests for modified files and install them in the table. • What is needed is a function that depends not only on the message, but also on some kind of secret.

Attacks on Message Digests • Brute-force: Let H be a message digest, a one-way function and M be some piece of data. Can you find a piece of data M’ such that H(M) = H(M’)? Say that you generate sequences of M’ and compute H(M’) for each one until you find a match. How many M’ would you have to test? • Birthday Attack: Say that H(.) produces n bits. If you choose M’ at random, you need to try at most 2n/2 messages to have greater than 50% chance of finding the M’ that you want. (See the Birthday Paradox in probability theory textbooks.)

Message Authentication Codes

Message Authentication Codes • A message authentication code (MAC) is a key-dependent message digest function: MAC(Key,Message) = h • The MAC can only be created or verified by someone who knows Key. • One can turn a one-way hash function into a MAC by encrypting the hash value with a symmetric-key cryptosystem.

Using a MAC MACs can be used to protect data integrity and authenticity: • Want to use a MAC to protect files on our computer against tampering: • Calculate MAC values for important files and store them in a table, • Recompute MACs from time to time and compare to stored values to verify that the files haven’t been modified. • Good: If someone modifies a file the hash of that file will be different. • Good: As long as no one knows the proper key, new MACs can’t be stored in the table to cover the intruder’s tracks.

Implementing a MAC Question: Does this structure look familiar?

Libraries for MDs and MACs mhash: Supports SHA1, GOST, HAVAL256, HAVAL224, HAVAL192, HAVAL160, HAVAL128, MD5, MD4, RIPEMD160, TIGER, TIGER160, TIGER128, CRC32B and CRC32 checksums. Free (GNU LGPL). http://mhash.sourceforge.net java.security: Offers a number of classes for applications needing crypto primitives. MessageDigest, for instance, is a class that produces digests according to MD5 or SHA. http://java.sun.com/j2se/1.4.2/docs/api/ OpenSSL: Secure sockets, MDs, MACs, ciphers (DES, AES, etc), big numbers, PRNGs, and lots of good stuff. http://www.openssl.org

Summary Message digests • Message digest functions are collision-resistant, one-way hash functions: • Collision-resistant: hard to find two values that produce the same output, • One-way: hard to determine what input produced a given output. • Protects the integrity of a digital document. MACs • A message authentication code is a key-dependent message digest function: • The output is a function of both the hash function and a secret key. • The MAC can only be created or verified by someone who knows the key. • Protects the integrityand the authenticity of a digital document.

Modern Cryptography: Cryptography Hashes