1 / 77

Huffman Coding

Huffman Coding. A simple example. Suppose we have a message consisting of 5 symbols, e.g. [ ►♣♣♠☻►♣☼►☻] How can we code this message using 0/1 so the coded message will have minimum length (for transmission or saving!) 5 symbols  at least 3 bits For a simple encoding,

phillipse
Download Presentation

Huffman Coding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Huffman Coding

  2. A simple example • Suppose we have a message consisting of 5 symbols, e.g. [►♣♣♠☻►♣☼►☻] • How can we code this message using 0/1 so the coded message will have minimum length (for transmission or saving!) • 5 symbols  at least 3 bits • For a simple encoding, length of code is 10*3=30 bits

  3. A simple example – cont. • Intuition: Those symbols that are more frequent should have smaller codes, yet since their length is not the same, there must be a way of distinguishing each code • For Huffman code, length of encoded message will be ►♣♣♠☻►♣☼►☻ =3*2 +3*2+2*2+3+3=24bits

  4. Another Example • A = 0B = 100C = 1010D = 1011R = 11 • ABRACADABRA = 01001101010010110100110 • This is eleven letters in 23 bits • A fixed-width encoding would require 3 bits for five different letters, or 33 bits for 11 letters • Notice that the encoded bit string can be decoded!

  5. Huffman codes • Binary character code: each character is represented by a unique binary string. • A data file can be coded in two ways: The first way needs 1003=300 bits. The second way needs 45 1+13 3+12 3+16 3+9 4+5 4=232 bits.

  6. Variable-length code • Need some carefulness to read the code. • 001011101 (codeword: a=0, b=00, c=01, d=11.) • Where to cut? 00 can be explained as either aa or b. • Prefix of 0011: 0, 00, 001, and 0011. • Prefix codes: no codeword is a prefix of some other codeword. (prefix free) • Prefix codes are simple to encode and decode.

  7. Using codeword in Table to encode and decode • Encode: abc = 0.101.100 = 0101100 • (just concatenate the codewords.) • Decode: 001011101 = 0.0.101.1101 = aabe

  8. 100 0 0 1 100 1 a:45 14 86 0 1 0 0 1 0 1 1 58 14 0 28 0 1 0 1 0 1 c:12 b:13 d:16 14 30 0 1 25 55 a:45 b:13 c:12 d:16 e:9 f:5 e:9 f:5 • Encode: abc = 0.101.100 = 0101100 • (just concatenate the codewords.) • Decode: 001011101 = 0.0.101.1101 = aabe • (use the (right)binary tree below:) Tree for the fixed length codeword Tree for variable-length codeword

  9. Binary tree • Every nonleaf node has two children. • Why? • The fixed-length code in our example is not optimal. • The total number of bits required to encode a file is • f ( c ): the frequency (number of occurrences) of c in the file • dT(c): denote the depth of c’s leaf in the tree

  10. Constructing an optimal coding scheme • Formal definition of the problem: • Input:a set of characters C={c1, c2, …, cn}, each cC has frequency f[c]. • Output: a binary tree representing codewords so that the total number of bits required for the file is minimized. • Huffman proposed a greedy algorithm to solve the problem.

  11. c:12 b:13 a:45 d:16 0 1 f:5 e:9 14 (a) f:5 e:9 c:12 b:13 d:16 a:45 (b)

  12. a:45 0 1 c:12 b:13 d:16 0 1 a:45 f:5 e:9 0 1 1 0 c:12 b:13 d:16 0 1 f:5 e:9 14 14 30 25 25 (c) (d)

  13. a:45 0 1 0 100 1 0 1 1 0 a:45 c:12 b:13 d:16 0 1 0 1 f:5 e:9 0 1 1 0 c:12 b:13 d:16 14 14 30 30 0 1 55 55 25 25 f:5 e:9 (f) (e)

  14. HUFFMAN(C) 1 n:=|C| 2 Q:=C 3 for i:=1 to n-1 do 4 z:=ALLOCATE_NODE() 5 x:=left[z]:=EXTRACT_MIN(Q) 6 y:=right[z]:=EXTRACT_MIN(Q) 7 f[z]:=f[x]+f[y] 8 INSERT(Q,z) 9 return EXTRACT_MIN(Q)

  15. The Huffman Algorithm • This algorithm builds the tree T corresponding to the optimal code in a bottom-up manner. • C is a set of n characters, and each character c in C is a character with a defined frequency f[c]. • Q is a priority queue, keyed on f, used to identify the two least-frequent characters to merge together. • The result of the merger is a new object (internal node) whose frequency is the sum of the two objects.

  16. Time complexity • Lines 4-8 are executed n-1 times. • Each heap operation in Lines 4-8 takes O(lg n) time. • Total time required is O(n lg n). Note: The details of heap operation will not be tested. Time complexity O(n lg n) should be remembered.

  17. An Complete ExampleScan the original text Eerie eyes seen near lake. • What characters are present? E e r i space y s n a l k .

  18. Char Freq. Char Freq. Char Freq. E 1 y 1 k 1 e 8 s 2 . 1 r 2 n 2 i 1 a 2 space 4 l 1 Building a TreeScan the original text Eerie eyes seen near lake. • What is the frequency of each character in the text?

  19. E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 Building a Tree • The array after inserting all nodes

  20. E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 Building a Tree

  21. Building a Tree y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 i 1 E 1

  22. Building a Tree 2 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 E 1 i 1

  23. Building a Tree 2 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 E 1 i 1 2 y 1 l 1

  24. Building a Tree 2 2 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 y 1 l 1 E 1 i 1

  25. Building a Tree 2 r 2 s 2 n 2 a 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 2 k 1 . 1

  26. Building a Tree 2 r 2 s 2 n 2 a 2 sp 4 e 8 2 2 k 1 . 1 y 1 l 1 E 1 i 1

  27. Building a Tree n 2 a 2 2 sp 4 e 8 2 2 E 1 i 1 y 1 l 1 k 1 . 1 4 r 2 s 2

  28. Building a Tree n 2 a 2 e 8 2 sp 4 2 4 2 k 1 . 1 r 2 s 2 E 1 i 1 y 1 l 1

  29. Building a Tree e 8 4 2 2 2 sp 4 r 2 s 2 y 1 l 1 k 1 . 1 E 1 i 1 4 n 2 a 2

  30. Building a Tree e 8 4 4 2 2 2 sp 4 r 2 s 2 n 2 a 2 y 1 l 1 k 1 . 1 E 1 i 1

  31. Building a Tree e 8 4 4 2 sp 4 r 2 s 2 n 2 a 2 k 1 . 1 4 2 2 E 1 i 1 l 1 y 1

  32. Building a Tree 4 4 4 2 e 8 sp 4 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 l 1 y 1

  33. Building a Tree 4 4 4 e 8 2 2 r 2 s 2 n 2 a 2 E 1 i 1 l 1 y 1 6 sp 4 2 k 1 . 1

  34. Building a Tree 6 4 4 e 8 4 2 sp 4 2 2 n 2 a 2 r 2 s 2 k 1 . 1 E 1 i 1 l 1 y 1 What is happening to the characters with a low number of occurrences?

  35. Building a Tree 4 6 e 8 2 2 2 sp 4 k 1 . 1 E 1 i 1 l 1 y 1 8 4 4 n 2 a 2 r 2 s 2

  36. Building a Tree 4 6 8 e 8 2 2 2 sp 4 4 4 k 1 . 1 E 1 i 1 l 1 y 1 n 2 a 2 r 2 s 2

  37. Building a Tree 8 e 8 4 4 10 n 2 a 2 r 2 s 2 4 6 2 2 2 sp 4 E 1 i 1 l 1 y 1 k 1 . 1

  38. Building a Tree 8 10 e 8 4 4 4 6 2 2 2 n 2 a 2 r 2 s 2 sp 4 E 1 i 1 l 1 y 1 k 1 . 1

  39. Building a Tree 10 16 4 6 2 2 e 8 8 2 sp 4 E 1 i 1 l 1 y 1 k 1 . 1 4 4 n 2 a 2 r 2 s 2

  40. Building a Tree 10 16 4 6 e 8 8 2 2 2 sp 4 4 4 E 1 i 1 l 1 y 1 k 1 . 1 n 2 a 2 r 2 s 2

  41. Building a Tree 26 16 10 4 e 8 8 6 2 2 2 4 4 sp 4 E 1 i 1 l 1 y 1 k 1 . 1 n 2 a 2 r 2 s 2

  42. Building a Tree After enqueueing this node there is only one node left in priority queue. 26 16 10 4 e 8 8 6 2 2 2 4 4 sp 4 E 1 i 1 l 1 y 1 k 1 . 1 n 2 a 2 r 2 s 2

  43. Using heap: P P P P P P L L L L L L R R R R R R c b d f e a 12 16 9 5 45 13

  44. Using heap: P P P P P P L L L L L L R R R R R R a c b e f d 16 12 9 45 5 13 CS3335 Design and Analysis of Algorithms/WANG Lusheng

  45. Using heap: P P P P P P L L L L L L R R R R R R a c b e f d 16 12 9 45 5 13 CS3335 Design and Analysis of Algorithms/WANG Lusheng

  46. Using heap: P P P P P P L L L L L L R R R R R R e c b a f d 16 12 45 9 5 13 CS3335 Design and Analysis of Algorithms/WANG Lusheng

  47. Using heap: P P P P P P L L L L L L R R R R R R e c a b f d 16 12 13 9 5 45 CS3335 Design and Analysis of Algorithms/WANG Lusheng

  48. Using heap: P P P P P P L L L L L L R R R R R R e c a b f d 16 12 13 9 5 45 CS3335 Design and Analysis of Algorithms/WANG Lusheng

  49. P P g g P P L L L L L L R R R R R R e a d c b f 45 16 5 9 12 13 CS3335 Design and Analysis of Algorithms/WANG Lusheng

  50. Using heap: P g P P g P P L f L L L L L R e R R R R R d c b f g a e 5 9 14 12 45 16 13 CS3335 Design and Analysis of Algorithms/WANG Lusheng

More Related