1 / 28

# Universal Hash Families - PowerPoint PPT Presentation

Universal Hash Families. Universal hash families. Family of hash functions Finite multiset H of string-valued functions, each h ∈ H having the same nonempty domain A ⊆{0,1} * and range B ⊆{0,1} * , for some constant b

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Universal Hash Families' - armina

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Universal Hash Families

• Family of hash functions

• Finite multiset H of string-valued functions, each h ∈ H having the same nonempty domain A ⊆{0,1}* and range B ⊆{0,1}*, for some constant b

• [Definition] -almost universal2 (-AU2) & -almost XOR universal2 (-AXU2)

• A family of hash functions H ={h: A{0,1}b} is -almost universal2 , written -AU2, if, for all distinct x, x’∈A, Prh∈ H [h(x)=h(x’)] .

• The family of hash functions H is -almost XOR universal2 , written -AXU2, if, for all distinct x, x’∈A, and for all c∈{0,1}b, Prh∈ H [h(x)h(x’)=c] .

•  =maxxx’{Prh [h(x)=h(x’)]} : collision probability

• Principle measure of an AU2 : How small its collision probability is and how fast one can compute its functions

• Make the domain of a hash family bigger

• H ={h: {0,1}a{0,1}b}

• Hm ={h: {0,1}am{0,1}bm}

• its elements are the same as in H where h(x1 x2… xm), for | xi |=a, is defined by h(x1)|| h(x2)||…|| h(xm)

• [Proposition] If H is -AU2, then Hm is -AU2

• Make the collision probability smaller

• H1 ={h: A{0,1}b1}, H2 ={h: A{0,1}b2}

• H1 & H2 ={h: A{0,1}b1+ b2}

• its elements are pairs of functions (h1, h2)∈H1H2 and where (h1, h2)(x) is defined as h1(x)||h2(x)

• [Proposition] If H1 is 1-AU2 and H2 is 2-AU2, then H1 & H2 is 12-AU2

• Make the image of a hash function shorter

• H1 ={h: {0,1}a{0,1}b}, H2 ={h: {0,1}b{0,1}c}

• H2○H1 ={h: {0,1}a{0,1}c}

• its elements are pairs of functions (h1, h2)∈H1H2 and where (h1, h2)(x) is defined as h2(h1(x))

• [Proposition] If H1 is 1-AU2 and H2 is 2-AU2, then H2○H1 is (1+2)-AU2

• Turn an AU2 family H1 and AXU2 family H2 into an AXU2 family H2○H1

• [Proposition] Suppose H1 ={h: AB} is 1-AU2 and H2 ={h: BC} is 2-AXU2. Then H2○H1 ={h: AC} is (1+2)-AXU2

• Carter-Wegman(1979, 1981)

• Efficient authentication code under strongly universal hash functions

• Key observations

• Long messages can be authenticated efficiently using short keys if the number of bits in the authentication tag is increased slightly compared to ‘perfect’ schemes

• If a message is hashed to a short authentication tag, weaker properties are sufficient for the first stage of the compression

• Under certain conditions, the hash function can remain the same for many plaintexts, provided that hash results is encrypted using a one-time pad

• Stinson(1994)

• Improves the works by Wegman-Carter and establishes an explicit link between authentication codes and strongly universal hash functions

• Johansson-Kabatianskii-Smeets(1996)

• Establish a relation between authentication codes and codes correcting independent errors

• Krawczyk(1994, 1995)

• Propose universal hash functions that are linear with respect to bitwise XOR

• Makes it easier to reuse the authentication code

• Encrypt the m-bit hash result for each new message using a one-time pad

• Simple and efficient constructions based on polynomials and LFSR

• Shoup(1996)

• Propose and analyze the constructions based on polynomials over finite fields

• Rogaway : Bucket hashing (1995)

• Halevi-Krawczyk : MMH (1997)

• Make optimal used of the multiply and accumulate instruction of the Pentium MMX processor

• Black-Halevi-Krawczyk-Krovertz-Rogaway : UMAC (1999)

• Further improved the performance on high end processors

• Bucket hashing

• is an -AU introduced by Rogaway

• Defining the Bucket Hash Family B

• word size w(1), parameters n(1), N(3)

• domain D={0,1}wn, range R={0,1}wN

• Let h Band let X=X1Xnbe the string we want to hash, where each |Xi|=w. Then h(X) is defined by the following algorithm. First, for each j{1,,N}, initialize Yjto 0w. Then, for each i{1,, n} and k  hi, replace Ykby Yk  Xi. When done, set h(X) = Y1||Y2||||YN.

• Pseudocode

for j 1 to N do Yj 0w

for i  1 to n do

Yhi1 Yhi1 Xi

Yhi2 Yhi2 Xi

Yhi3 Yhi3 Xi

return Y1||Y2||||YN

• Bucket Hashing with Small Key Size

• N=2s/L

• Each hash function hB’[w,M,N] is specified by a list of length M

• each entry contains L integers in the interval [0, N-1]

• L arrays are introduced, each containing N buckets

• Next, each array is compressed to s/L words, using a fixed primitive element GF(2s/L)

• The hash result is equal to the concatenation of the L compressed arrays, each containing s/L words

• Hash Family Based on Fast Polynomial Evaluation

• is based on polynomial evaluation over a finite field

• q = 2r, Q = 2m = 2r+s, n = 1+2s,  : a linear mapping from GF(Q) onto GF(q)

• Q = q0m, q = q0r , q0 : a prime power

• fa(x) =a0 + a1x +  + an-1xn-1

• x, y, a0, a1, , an-1  GF(Q), z  GF(q)

• H = {hx,y,z : hx,y,z(a) = hx,y,z(a0, a1, , an-1) =  (y fa(x)) + z}

• Hash Family Using Toeplitz Matrices

• Toeplitz matrices are matrices with constant values on the left-to-right diagonals

• A Toeplitz matrix of dimension n  m can be used to hash messages of length m to hash results of length n by vector-matrix multiplication

• The Toeplitz construction uses matrices generated by sequences of length n + m - 1 drawn from -biased distributions

• -biased distributions are a tool for replacing truly random sequences by more compact and easier to generate sequences

• The lower , the more random the sequence is

• Krawczyk proves that the family of hash functions associated with a family of Toeplitz-matrices corresponding to sequences selected from a -biased distribution is  -AXU with  = 2-n + 

• Evaluation Hash Function

• is one of the variants analyzed by Shoup

• The input (of length  tn) : viewed as a polynomial M(x) of degree < t over GF(2n)

• The key : a random element   GF(2n)

• the hash result : equal to M() GF(2n)

• This family of hash functions is -AXU with  = t/2n

• Division Hash Function

• represents the input as a polynomial M(x) of degree less than tn over GF(2)

• The hash key : a random irreducible polynomial p(x) of degree n over GF(2)

• The hash result : m(x) xn mod p(x)

• This family of hash functions is -AXU with  = tn/2n

• The total number of irreducible polynomials of degree n is roughly equal to 2n/n

• MMH(Multilinear Modular Hashing) hashing

• consists of a (modified) inner product between message and key modulo a prime p (close to 2w, with w the word length; below w = 32)

• is an -AXU2, but with xor replaced by subtraction modulo p

• The core hash function maps 32 32-bit message words and 32 32-bit key words to a 32-bit result

• The key size is 1024 bits and  = 1.5/ 230

• For larger messages, a tree construction can be used

• the value of  and the key length have to be multiplied by the height of the tree

• This algorithm is very fast on the Pentium Pro, which has a multiply and accumulate instruction

• On a 32-bit machine, MMH requires only 2 instructions per byte for a 32-bit result

• Scheme A

• the input : divided into 32 blocks of 8 Kbyte

• each block is hashed using the same bucket hash function with N = 160

• results in an intermediate string of 20480 bytes

• Scheme B

• the input : divided into 64 blocks of 4 Kbyte

• each block is hashed using the same bucket hash function with short key(s=42, L=6, N=128)

• results in an intermediate string of 10752 bytes

• Scheme C

• the input is divided into 64 blocks of 4 Kbyte

• each block is hashed using a 331024 Toeplitz matrix, based on a -biased sequence of length 1056 generated using an 88-bit LFSR

• The length of the intermediate string is 8448 bytes

• Scheme D

• the input : hashed twice using the polynomial evaluation hash function with  = 2-15 resulting in a combined value of 2-30

• W = 5

• The performance is slightly key dependent. Therefore an average over a number of keys has been computed.

• Scheme E

• this is simply the evaluation hash function with t = 32768

• the resulting value of  is too small

• However, choosing a smaller value of n that is not a multiple of 32 induces a performance penalty

• Scheme F

• the input : divided into 2048 blocks of 128 bytes

• each block is hashed twice using MMH

• the length of the intermediate string is 16384 bytes

• It is not possible to obtain a value of closer to 2-32 in an efficient way

### Universal Hashing MAC

• Message authentication based on Universal hashing

• Wegman-Carter approach

• The parties share a secret key k=(h,P)

• P : infinite random string

• h : function drawn randomly from a strongly universal2 family of hash functions H

• H is strongly universal2 if, for all xx’, the random variable h1(x)||h2(x), for h ∈ H, is uniformly distributed

• To authenticate a message x, the sender transmits h(x) xored with the next piece of the pad P

• Standard cryptographic technique

• use of a pseudorandom function family, F

• [Theorem] Assume H is -AXU2, and that F is replaced by the truly random function family R of functions. In this case, if an adversary makes q1 queries to the authentication algorithm S and q2 queries to the verification algorithm V, the probability of forging a MAC is at most q2

• Why Universal hashing MAC?

• The speed of a universal hashing MAC depends on the speed of the hashing step and encrypting step

• The encryption does not take long

• hash function compresses messages => the encrypting message is short

• The combinatorial properties of the universal hash function family is mathematically proven

• needs no “over-design” or “safe margin” the way a cryptographic primitive would

• Universal hashing MAC makes for desirable security properties

• can select a cryptographically conservative design for the encrypting step

• can pay with only a minor impact on speed

• the cryptographic primitive is applied only to the much shorter hashed image of the message

• security and efficiency are not conflicting requirements

• The UMAC algorithm

• species how the message, key, and nonce determine an authentication tag

• The sender

• will need to provide the receiver with the message, nonce, and tag

• can then compute what “should be” the tag for this particular message and nonce, and see if it matches the received tag

• employs a subkey generation process in which the shared (convenient-length) key is mapped into UMAC's internal keys

• subkey generation is done just once, at the beginning of a communication session during which the key does not change, and so subkey-generation is usually not performance-critical

• UMAC depends on a few different parameters

• An illustrative special case of UMAC

• Subkey generation:

• Using a PRG, map Key to K = K1K2  K1024 and to A

• each Ki : a 32-bit word, |A| = 512

• Hashing the message Msg to HM = NHXKey(Msg):

• Let Len be |Msg| mod 4096, encoded as a 2-byte string

• Append to Msg the minimum number of 0 bits to make |Msg| divisible by 8

• Let Msg = Msg1 || Msg2 ||  || Msgt where each Msgi is 1024 words except for Msgt, which has between 2 and 1024 words

• Let HM = NHK(Msg1) || NHK(Msg2) || || NHK(Msgt) || Len

• Computing the authentication tag:

• The tag is Tag = HMAC-SHA1A(HM || Nonce)

• Definition of NH

• blocksize n  2, wordsize w  1

• domain : A = {0, 1}2w {0, 1}2w  {0, 1}nw

• range : B = {0, 1}2w

• a random function in NH[n,w] is given by a random nw-bit string K

• Uw : {0,  , 2w-1}, U2w : {0,  , 22w -1}

• for integers x, ylet (x +wy) denote (x + y) mod 2w

• M  A and M = M1  Ml, |M1| =  = |Ml|= w

• K = {0, 1}nwandK = K1  Kn, |K1| =  = |Kn|= w

• where miUw is the number that Mi represents (as an unsigned integer)

• where kiUw is the number that Ki represents (as an unsigned integer)

• the right-hand side of the above equation is understood to name the (unique) 2w-bit string which represents (as an unsigned integer) the U2w-valued integer result