1 / 63

Topic 20: Huffman Coding

Topic 20: Huffman Coding. The author should gaze at Noah, and ... learn, as they did in the Ark, to crowd a great deal of matter into a very small compass. Sydney Smith, Edinburgh Review. Compression. Storage of digital information measured in bits and bytes.

Download Presentation

Topic 20: Huffman Coding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic 20: Huffman Coding The author should gaze at Noah, and ... learn, as they did in the Ark, to crowd a great deal of matter into a very small compass. Sydney Smith, Edinburgh Review

  2. Compression Storage of digital information measured in bits and bytes. Store same information with less memory = compression lossless and lossy compression Is it really necessary? Huffman Coding

  3. Compression - Why Bother? Apostolos "Toli" Lerios Facebook Engineer Heads image storage group jpeg images already compressed look for ways to compress even more 1% less space = millions of dollars in savings Huffman Coding

  4. Little Pipes and Big Pumps Internet Access Cost CPU Capability $1,500 for a good computer Intel i7 processor Assume it lasts 3 years. Memory bandwidth25.6 GB / sec= 2.6 * 1010 bytes / sec 7.6 * 1010 instructions / second • 40 Mbps roughly $40 per month. • 12 months * 3 years * $40 = $1,440 • 10,000,000 bits /second= 1.25 * 106 bytes / sec Huffman Coding

  5. Little Pipes and Big Pumps CPU Data In

  6. Encoding • UTCS • 85 84 67 83 • 0101 0101 0101 0100 0100 0011 0101 0011 • 0V5V0V5V 0V5V0V5V0V5V0V5V 0V5V0V0V 0V5V0V0V 0V0V5V5V 0V5V0V5V 0V0V5V5V • what is a file? what is a file extension? • open a bitmap in a text editor • open a pdf in word Huffman Coding

  7. Purpose of Huffman Coding • Proposed by Dr. David A. Huffman • A Method for the Construction of Minimum Redundancy Codes • Written in 1952 • Applicable to many forms of data transmission • Our example: text files • still used in fax machines, mp3 encoding, others Huffman Coding

  8. The Basic Algorithm • Huffman coding is a form of statistical coding • Not all characters occur with the same frequency! • Yet in ASCII all characters are allocated the same amount of space • 1 char = 1 byte, be it e or x Huffman Coding

  9. The Basic Algorithm • Any savings in tailoring codes to frequency of character? • Code word lengths are no longer fixed like ASCII or Unicode • Code word lengths vary and will be shorter for the more frequently used characters Huffman Coding

  10. The Basic Algorithm • 1. Scan text to be compressed and tally occurrence of all characters. • 2. Sort or prioritize characters based on number of occurrences in text. • 3. Build Huffman code tree based on prioritized list. • 4. Perform a traversal of tree to determine all code words. • 5. Scan text again and create new file using the Huffman codes Huffman Coding

  11. Building a TreeScan the original text • Consider the following short text • Eerie eyes seen near lake. • Count up the occurrences of all characters in the text Huffman Coding

  12. Building a TreeScan the original text Eerie eyes seen near lake. • What characters are present? E e r i space y s n a r l k . Huffman Coding

  13. Char Freq. Char Freq. Char Freq. E 1 y 1 k 1 e 8 s 2 . 1 r 2 n 2 i 1 a 2 space 4 l 1 Building a TreeScan the original text Eerie eyes seen near lake. • What is the frequency of each character in the text? Huffman Coding

  14. Building a TreePrioritize characters • Create binary tree nodes with character and frequency of each character • Place nodes in a priority queue • The lower the occurrence, the higher the priority in the queue Huffman Coding

  15. E 1 i 1 l 1 . 1 sp 4 e 8 k 1 y 1 a 2 n 2 r 2 s 2 Building a Tree • The queue after inserting all nodes • Null Pointers are not shown Huffman Coding

  16. Building a Tree • While priority queue contains two or more nodes • Create new node • Dequeue node and make it left subtree • Dequeue next node and make it right subtree • Frequency of new node equals sum of frequency of left and right children • Enqueue new node back into queue Huffman Coding

  17. E 1 i 1 l 1 . 1 sp 4 e 8 k 1 y 1 a 2 n 2 r 2 s 2 Building a Tree Huffman Coding

  18. Building a Tree l 1 . 1 sp 4 e 8 k 1 y 1 a 2 n 2 r 2 s 2 2 i 1 E 1 Huffman Coding

  19. Building a Tree 2 k 1 l 1 y 1 . 1 a 2 n 2 r 2 s 2 sp 4 e 8 E 1 i 1 Huffman Coding

  20. Building a Tree 2 y 1 . 1 a 2 n 2 r 2 s 2 sp 4 e 8 E 1 i 1 2 k 1 l 1 Huffman Coding

  21. Building a Tree 2 2 y 1 . 1 a 2 n 2 r 2 s 2 sp 4 e 8 k 1 l 1 E 1 i 1 Huffman Coding

  22. Building a Tree 2 a 2 n 2 r 2 s 2 2 sp 4 e 8 k 1 l 1 E 1 i 1 2 y 1 . 1 Huffman Coding

  23. Building a Tree 2 a 2 n 2 r 2 s 2 sp 4 e 8 2 2 y 1 . 1 k 1 l 1 E 1 i 1 Huffman Coding

  24. Building a Tree r 2 s 2 2 sp 4 e 8 2 2 E 1 i 1 k 1 l 1 y 1 . 1 4 a 2 n 2 Huffman Coding

  25. Building a Tree r 2 s 2 e 8 2 sp 4 2 4 2 y 1 . 1 a 2 n 2 E 1 i 1 k 1 l 1 Huffman Coding

  26. Building a Tree e 8 4 2 2 2 sp 4 a 2 n 2 k 1 l 1 y 1 . 1 E 1 i 1 4 r 2 s 2 Huffman Coding

  27. Building a Tree e 8 4 4 2 2 2 sp 4 a 2 n 2 r 2 s 2 k 1 l 1 y 1 . 1 E 1 i 1 Huffman Coding

  28. Building a Tree e 8 4 4 2 sp 4 a 2 n 2 r 2 s 2 y 1 . 1 4 2 2 E 1 i 1 l 1 k 1 Huffman Coding

  29. Building a Tree 4 4 4 2 e 8 sp 4 2 2 a 2 n 2 r 2 s 2 y 1 . 1 E 1 i 1 l 1 k 1 Huffman Coding

  30. Building a Tree 4 4 4 e 8 2 2 a 2 n 2 r 2 s 2 E 1 i 1 l 1 k 1 6 sp 4 2 y 1 . 1 Huffman Coding

  31. Building a Tree 6 4 4 e 8 4 2 sp 4 2 2 r 2 s 2 a 2 n 2 y 1 . 1 E 1 i 1 l 1 k 1 What is happening to the characters with a low number of occurrences? Huffman Coding

  32. Building a Tree 4 6 e 8 2 2 2 sp 4 y 1 . 1 E 1 i 1 l 1 k 1 8 4 4 r 2 s 2 a 2 n 2 Huffman Coding

  33. Building a Tree 4 6 8 e 8 2 2 2 sp 4 4 4 r 1 . 1 E 1 i 1 l 1 k 1 r 2 s 2 a 2 n 2 Huffman Coding

  34. Building a Tree 8 e 8 4 4 10 r 2 s 2 a 2 n 2 4 6 2 2 2 sp 4 E 1 i 1 l 1 k 1 y 1 . 1 Huffman Coding

  35. Building a Tree 8 10 e 8 4 4 4 6 2 2 2 r 2 s 2 a 2 n 2 sp 4 E 1 i 1 l 1 k 1 y 1 . 1 Huffman Coding

  36. Building a Tree 10 16 4 6 2 2 e 8 8 2 sp 4 E 1 i 1 l 1 k 1 y 1 . 1 4 4 r 2 s 2 a 2 n 2 Huffman Coding

  37. Building a Tree 10 16 4 6 e 8 8 2 2 2 sp 4 4 4 E 1 i 1 l 1 k 1 y 1 . 1 r 2 s 2 a 2 n 2 Huffman Coding

  38. Building a Tree 26 16 10 4 e 8 8 6 2 2 2 4 4 sp 4 E 1 i 1 l 1 k 1 y 1 . 1 r 2 s 2 a 2 n 2 Huffman Coding

  39. Building a Tree • After enqueueing this node there is only one node left in priority queue. 26 16 10 4 e 8 8 6 2 2 2 4 4 sp 4 E 1 i 1 l 1 k 1 y 1 . 1 r 2 s 2 a 2 n 2 Huffman Coding

  40. 26 16 10 4 e 8 8 6 2 2 2 4 4 sp 4 E 1 i 1 l 1 k 1 y 1 . 1 r 2 s 2 a 2 n 2 Building a Tree Dequeue the single node left in the queue. This tree contains the new code words for each character. Frequency of root node should equal number of characters in text. Eerie eyes seen near lake. 4 spaces,26 characters total Huffman Coding

  41. 26 16 10 4 e 8 8 6 2 2 2 4 4 sp 4 E 1 i 1 l 1 k 1 y 1 . 1 r 2 s 2 a 2 n 2 Encoding the FileTraverse Tree for Codes • Perform a traversal of the tree to obtain new code words • left, append a 0 to code word • right append a 1 to code word • code word is only completed when a leaf node is reached Huffman Coding

  42. 26 16 10 4 e 8 8 6 2 2 2 4 4 sp 4 E 1 i 1 l 1 k 1 y 1 . 1 r 2 s 2 a 2 n 2 Encoding the FileTraverse Tree for Codes Char Code E 0000 i 0001 k 0010 l 0011 y 0100 . 0101 space 011 e 10 a 1100 n 1101 r 1110 s 1111 Huffman Coding

  43. Rescan text and encode file using new code words Eerie eyes seen near lake. Encoding the File Char Code E 0000 i 0001 k 0010 l 0011 y 0100 . 0101 space 011 e 10 a 1100 n 1101 r 1110 s 1111 000010111000011001110010010111101111111010110101111011011001110011001111000010100101 Huffman Coding

  44. Have we made things any better? 82bits to encode the text ASCII would take 8 * 26 = 208 bits Encoding the FileResults 000010111000011001110010010111101111111010110101111011011001110011001111000010100101 • If modified code used 4 bits per character are needed. Total bits 4 * 26 = 104. Savings not as great. Huffman Coding

  45. Decoding the File • How does receiver know what the codes are? • Tree constructed for each text file. • Considers frequency for each file • Big hit on compression, especially for smaller files • Tree predetermined • based on statistical analysis of text files or file types Huffman Coding

  46. Once receiver has tree it scans incoming bit stream 0  go left 1  go right 1010001001111000111111 11011100001010 elk nay sir eek a snake eek kin sly eek snarl nil eel a snarl 26 16 10 4 e 8 8 6 2 2 2 4 4 sp 4 E 1 i 1 l 1 k 1 y 1 . 1 r 2 s 2 a 2 n 2 Decoding the File Huffman Coding

  47. Assignment Hints • reading chunks not chars • header format • the pseudo eof character • the GUI Huffman Coding

  48. Assignment Example • "Eerie eyes seen near lake." will result in different codes than those shown in slides due to: • adding elements in order to PriorityQueue • required pseudo eof character (PEOF) Huffman Coding

  49. Assignment Example Char Freq. Char Freq. Char Freq. E 1 y 1 k 1 e 8 s 2 . 1 r 2 n 2 PEOF 1 i 1 a 2 space 4 l 1 Huffman Coding

  50. Assignment Example e 8 . 1 E 1 i 1 k 1 l 1 y 1 PEOF 1 a 2 n 2 r 2 s 2 SP 4 Huffman Coding

More Related