Universal Hash Families

Universal Hash Families

Universal hash families • Family of hash functions • Finite multiset H of string-valued functions, each h ∈ H having the same nonempty domain A ⊆{0,1}* and range B ⊆{0,1}*, for some constant b • [Definition] -almost universal2 (-AU2) & -almost XOR universal2 (-AXU2) • A family of hash functions H ={h: A{0,1}b} is -almost universal2 , written -AU2, if, for all distinct x, x’∈A, Prh∈ H [h(x)=h(x’)] . • The family of hash functions H is -almost XOR universal2 , written -AXU2, if, for all distinct x, x’∈A, and for all c∈{0,1}b, Prh∈ H [h(x)h(x’)=c] . •  =maxxx’{Prh [h(x)=h(x’)]} : collision probability • Principle measure of an AU2 : How small its collision probability is and how fast one can compute its functions

Composition of universal hash families • Make the domain of a hash family bigger • H ={h: {0,1}a{0,1}b} • Hm ={h: {0,1}am{0,1}bm} • its elements are the same as in H where h(x1 x2… xm), for | xi |=a, is defined by h(x1)|| h(x2)||…|| h(xm) • [Proposition] If H is -AU2, then Hm is -AU2 • Make the collision probability smaller • H1 ={h: A{0,1}b1}, H2 ={h: A{0,1}b2} • H1 & H2 ={h: A{0,1}b1+ b2} • its elements are pairs of functions (h1, h2)∈H1H2 and where (h1, h2)(x) is defined as h1(x)||h2(x) • [Proposition] If H1 is 1-AU2 and H2 is 2-AU2, then H1 & H2 is 12-AU2

Composition of universal hash families(cont.) • Make the image of a hash function shorter • H1 ={h: {0,1}a{0,1}b}, H2 ={h: {0,1}b{0,1}c} • H2○H1 ={h: {0,1}a{0,1}c} • its elements are pairs of functions (h1, h2)∈H1H2 and where (h1, h2)(x) is defined as h2(h1(x)) • [Proposition] If H1 is 1-AU2 and H2 is 2-AU2, then H2○H1 is (1+2)-AU2 • Turn an AU2 family H1 and AXU2 family H2 into an AXU2 family H2○H1 • [Proposition] Suppose H1 ={h: AB} is 1-AU2 and H2 ={h: BC} is 2-AXU2. Then H2○H1 ={h: AC} is (1+2)-AXU2

Related researches • Carter-Wegman(1979, 1981) • Efficient authentication code under strongly universal hash functions • Key observations • Long messages can be authenticated efficiently using short keys if the number of bits in the authentication tag is increased slightly compared to ‘perfect’ schemes • If a message is hashed to a short authentication tag, weaker properties are sufficient for the first stage of the compression • Under certain conditions, the hash function can remain the same for many plaintexts, provided that hash results is encrypted using a one-time pad • Stinson(1994) • Improves the works by Wegman-Carter and establishes an explicit link between authentication codes and strongly universal hash functions • Johansson-Kabatianskii-Smeets(1996) • Establish a relation between authentication codes and codes correcting independent errors

Related researches(cont.) • Krawczyk(1994, 1995) • Propose universal hash functions that are linear with respect to bitwise XOR • Makes it easier to reuse the authentication code • Encrypt the m-bit hash result for each new message using a one-time pad • Simple and efficient constructions based on polynomials and LFSR • Shoup(1996) • Propose and analyze the constructions based on polynomials over finite fields • Rogaway : Bucket hashing (1995) • Halevi-Krawczyk : MMH (1997) • Make optimal used of the multiply and accumulate instruction of the Pentium MMX processor • Black-Halevi-Krawczyk-Krovertz-Rogaway : UMAC (1999) • Further improved the performance on high end processors

Constructions • Bucket hashing • is an -AU introduced by Rogaway • Defining the Bucket Hash Family B • word size w(1), parameters n(1), N(3) • domain D={0,1}wn, range R={0,1}wN • Let h Band let X=X1Xnbe the string we want to hash, where each |Xi|=w. Then h(X) is defined by the following algorithm. First, for each j{1,,N}, initialize Yjto 0w. Then, for each i{1,, n} and k  hi, replace Ykby Yk  Xi. When done, set h(X) = Y1||Y2||||YN. • Pseudocode for j 1 to N do Yj 0w for i  1 to n do Yhi1 Yhi1 Xi Yhi2 Yhi2 Xi Yhi3 Yhi3 Xi return Y1||Y2||||YN

Constructions(cont.) • Bucket Hashing with Small Key Size • N=2s/L • Each hash function hB’[w,M,N] is specified by a list of length M • each entry contains L integers in the interval [0, N-1] • L arrays are introduced, each containing N buckets • Next, each array is compressed to s/L words, using a fixed primitive element GF(2s/L) • The hash result is equal to the concatenation of the L compressed arrays, each containing s/L words

Constructions(cont.) • Hash Family Based on Fast Polynomial Evaluation • is based on polynomial evaluation over a finite field • q = 2r, Q = 2m = 2r+s, n = 1+2s,  : a linear mapping from GF(Q) onto GF(q) • Q = q0m, q = q0r , q0 : a prime power • fa(x) =a0 + a1x +  + an-1xn-1 • x, y, a0, a1, , an-1  GF(Q), z  GF(q) • H = {hx,y,z : hx,y,z(a) = hx,y,z(a0, a1, , an-1) =  (y fa(x)) + z}

Constructions(cont.) • Hash Family Using Toeplitz Matrices • Toeplitz matrices are matrices with constant values on the left-to-right diagonals • A Toeplitz matrix of dimension n  m can be used to hash messages of length m to hash results of length n by vector-matrix multiplication • The Toeplitz construction uses matrices generated by sequences of length n + m - 1 drawn from -biased distributions • -biased distributions are a tool for replacing truly random sequences by more compact and easier to generate sequences • The lower , the more random the sequence is • Krawczyk proves that the family of hash functions associated with a family of Toeplitz-matrices corresponding to sequences selected from a -biased distribution is  -AXU with  = 2-n + 

Constructions(cont.) • Evaluation Hash Function • is one of the variants analyzed by Shoup • The input (of length  tn) : viewed as a polynomial M(x) of degree < t over GF(2n) • The key : a random element   GF(2n) • the hash result : equal to M() GF(2n) • This family of hash functions is -AXU with  = t/2n

Constructions(cont.) • Division Hash Function • represents the input as a polynomial M(x) of degree less than tn over GF(2) • The hash key : a random irreducible polynomial p(x) of degree n over GF(2) • The hash result : m(x) xn mod p(x) • This family of hash functions is -AXU with  = tn/2n • The total number of irreducible polynomials of degree n is roughly equal to 2n/n

Constructions(cont.) • MMH(Multilinear Modular Hashing) hashing • consists of a (modified) inner product between message and key modulo a prime p (close to 2w, with w the word length; below w = 32) • is an -AXU2, but with xor replaced by subtraction modulo p • The core hash function maps 32 32-bit message words and 32 32-bit key words to a 32-bit result • The key size is 1024 bits and  = 1.5/ 230 • For larger messages, a tree construction can be used • the value of  and the key length have to be multiplied by the height of the tree • This algorithm is very fast on the Pentium Pro, which has a multiply and accumulate instruction • On a 32-bit machine, MMH requires only 2 instructions per byte for a 32-bit result

Comparing the Hash Functions

Comparing the Hash Functions(cont.) • Scheme A • the input : divided into 32 blocks of 8 Kbyte • each block is hashed using the same bucket hash function with N = 160 • results in an intermediate string of 20480 bytes • Scheme B • the input : divided into 64 blocks of 4 Kbyte • each block is hashed using the same bucket hash function with short key(s=42, L=6, N=128) • results in an intermediate string of 10752 bytes • Scheme C • the input is divided into 64 blocks of 4 Kbyte • each block is hashed using a 331024 Toeplitz matrix, based on a -biased sequence of length 1056 generated using an 88-bit LFSR • The length of the intermediate string is 8448 bytes

Comparing the Hash Functions(cont.) • Scheme D • the input : hashed twice using the polynomial evaluation hash function with  = 2-15 resulting in a combined value of 2-30 • W = 5 • The performance is slightly key dependent. Therefore an average over a number of keys has been computed. • Scheme E • this is simply the evaluation hash function with t = 32768 • the resulting value of  is too small • However, choosing a smaller value of n that is not a multiple of 32 induces a performance penalty • Scheme F • the input : divided into 2048 blocks of 128 bytes • each block is hashed twice using MMH • the length of the intermediate string is 16384 bytes • It is not possible to obtain a value of closer to 2-32 in an efficient way

Universal Hashing MAC

Message authentication based on Universal hashing • Message authentication based on Universal hashing • Wegman-Carter approach • The parties share a secret key k=(h,P) • P : infinite random string • h : function drawn randomly from a strongly universal2 family of hash functions H • H is strongly universal2 if, for all xx’, the random variable h1(x)||h2(x), for h ∈ H, is uniformly distributed • To authenticate a message x, the sender transmits h(x) xored with the next piece of the pad P • Standard cryptographic technique • use of a pseudorandom function family, F • [Theorem] Assume H is -AXU2, and that F is replaced by the truly random function family R of functions. In this case, if an adversary makes q1 queries to the authentication algorithm S and q2 queries to the verification algorithm V, the probability of forging a MAC is at most q2

Universal hashing MAC • Why Universal hashing MAC? • The speed of a universal hashing MAC depends on the speed of the hashing step and encrypting step • The encryption does not take long • hash function compresses messages => the encrypting message is short • The combinatorial properties of the universal hash function family is mathematically proven • needs no “over-design” or “safe margin” the way a cryptographic primitive would • Universal hashing MAC makes for desirable security properties • can select a cryptographically conservative design for the encrypting step • can pay with only a minor impact on speed • the cryptographic primitive is applied only to the much shorter hashed image of the message • security and efficiency are not conflicting requirements

UMAC • The UMAC algorithm • species how the message, key, and nonce determine an authentication tag • The sender • will need to provide the receiver with the message, nonce, and tag • The receiver • can then compute what “should be” the tag for this particular message and nonce, and see if it matches the received tag • employs a subkey generation process in which the shared (convenient-length) key is mapped into UMAC's internal keys • subkey generation is done just once, at the beginning of a communication session during which the key does not change, and so subkey-generation is usually not performance-critical • UMAC depends on a few different parameters

UMAC(cont.) • An illustrative special case of UMAC • Subkey generation: • Using a PRG, map Key to K = K1K2  K1024 and to A • each Ki : a 32-bit word, |A| = 512 • Hashing the message Msg to HM = NHXKey(Msg): • Let Len be |Msg| mod 4096, encoded as a 2-byte string • Append to Msg the minimum number of 0 bits to make |Msg| divisible by 8 • Let Msg = Msg1 || Msg2 ||  || Msgt where each Msgi is 1024 words except for Msgt, which has between 2 and 1024 words • Let HM = NHK(Msg1) || NHK(Msg2) || || NHK(Msgt) || Len • Computing the authentication tag: • The tag is Tag = HMAC-SHA1A(HM || Nonce)

UMAC(cont.) • Definition of NH • blocksize n  2, wordsize w  1 • domain : A = {0, 1}2w {0, 1}2w  {0, 1}nw • range : B = {0, 1}2w • a random function in NH[n,w] is given by a random nw-bit string K • Uw : {0,  , 2w-1}, U2w : {0,  , 22w -1} • for integers x, ylet (x +wy) denote (x + y) mod 2w • M  A and M = M1  Ml, |M1| =  = |Ml|= w • K = {0, 1}nwandK = K1  Kn, |K1| =  = |Kn|= w

UMAC(cont.) • where miUw is the number that Mi represents (as an unsigned integer) • where kiUw is the number that Ki represents (as an unsigned integer) • the right-hand side of the above equation is understood to name the (unique) 2w-bit string which represents (as an unsigned integer) the U2w-valued integer result

Compression Function of MD4

Compression Function of MD5

Compression Function of RIPEMD-160

Compression Function of SHA-1

Conclusions

Universal Hash Families

Universal Hash Families

Presentation Transcript

Hash Tables

Hash table

Universal well-being assessment for families

Hash functions

Universal Hash Functions: Definition and Example

Hash Tables

Universal Hash Functions: Performance Guarantees (Chaining)

Hash Functions

Hash Table

Higher Order Universal One-Way Hash Functions

Hash Functions

Hash Tables

Hash Tables

Hash

Hashing - Hash Maps and Hash Functions

Hash Tables