1 / 32

Greedy Algorithms (Huffman Coding)

Greedy Algorithms (Huffman Coding). Huffman Coding. Huffman coding. A technique to compress data effectively Usually between 20%-90% compression Lossless compression No information is lost When decompress, you get the original file. Compressed file. Original file.

serge
Download Presentation

Greedy Algorithms (Huffman Coding)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Greedy Algorithms(Huffman Coding)

  2. Huffman Coding Huffman coding • A technique to compress data effectively • Usually between 20%-90% compression • Lossless compression • No information is lost • When decompress, you get the original file Compressed file Original file

  3. Huffman Coding: Applications Huffman coding • Saving space • Store compressed files instead of original files • Transmitting files or data • Send compressed data to save transmission time and power • Encryption and decryption • Cannot read the compressed file without knowing the “key” Compressed file Original file

  4. Main Idea: Frequency-Based Encoding • Assume in this file only 6 characters appear • E, A, C, T, K, N • The frequencies are: • Option I (No Compression) • Each character = 1 Byte (8 bits) • Total file size = 14,700 * 8 = 117,600 bits • Option 2 (Fixed size compression) • We have 6 characters, so we need 3 bits to encode them • Total file size = 14,700 * 3 = 44,100 bits

  5. Main Idea: Frequency-Based Encoding(Cont’d) • Assume in this file only 6 characters appear • E, A, C, T, K, N • The frequencies are: • Option 3 (Huffman compression) • Variable-length compression • Assign shorter codes to more frequent characters and longer codes to less frequent characters • Total file size: (10,000 x 1) + (4,000 x 2) + (300 x 3) + (200 x 4) + (100 x 5) + (100 x 5) = 20,700 bits

  6. Huffman Coding • A variable-length coding for characters • More frequent characters  shorter codes • Less frequent characters  longer codes • It is not like ASCII coding where all characters have the same coding length (8 bits) • Two main questions • How to assign codes (Encoding process)? • How to decode (from the compressed file, generate the original file) (Decoding process)?

  7. Decoding for fixed-length codes is much easier 010001100110111000 Divide into 3’s 010 001 100 110 111 000 Decode C A T K N E

  8. Decoding for variable-length codes is not that easy… 000001 It means what??? AEC EAC EEEC Huffman encoding guarantees to avoid this uncertainty …Always have a single decoding

  9. Huffman Algorithm • Step 1: Get Frequencies • Scan the file to be compressed and count the occurrence of each character • Sort the characters based on their frequency • Step 2: Build Tree & Assign Codes • Build a Huffman-code tree (binary tree) • Traverse the tree to assign codes • Step 3: Encode (Compress) • Scan the file again and replace each character by its code • Step 4: Decode (Decompress) • Huffman tree is the key to decompress the file

  10. Step 1: Get Frequencies Input File: Eerie eyes seen near lake.

  11. Step 2: Build Huffman Tree & Assign Codes • It is a binary tree in which each character is a leaf node • Initially each node is a separate root • At each step • Select two roots with smallest frequency and connect them to a new parent (Break ties arbitrary) [The greedy choice] • The parent will get the sum of frequencies of the two child nodes • Repeat until you have one root

  12. Example Each char. has a leaf node with its frequency

  13. Find the smallest two frequencies…Replace them with their parent

  14. Find the smallest two frequencies…Replace them with their parent

  15. Find the smallest two frequencies…Replace them with their parent

  16. Find the smallest two frequencies…Replace them with their parent

  17. Find the smallest two frequencies…Replace them with their parent

  18. Find the smallest two frequencies…Replace them with their parent

  19. Find the smallest two frequencies…Replace them with their parent

  20. Find the smallest two frequencies…Replace them with their parent

  21. Find the smallest two frequencies…Replace them with their parent

  22. Find the smallest two frequencies…Replace them with their parent

  23. Find the smallest two frequencies…Replace them with their parent

  24. Now we have a single root…This is the Huffman Tree

  25. Lets Analyze Huffman Tree • All characters are at the leaf nodes • The number at the root = # of characters in the file • High-frequency chars (E.g., “e”) are near the root • Low-frequency chars are far from the root

  26. Lets Assign Codes • Traverse the tree • Any left edge  add label 0 • As right edge  add label 1 • The code for each character is its root-to-leaf label sequence

  27. Lets Assign Codes 1 • Traverse the tree • Any left edge  add label 0 • As right edge  add label 1 • The code for each character is its root-to-leaf label sequence 0 0 0 1 1 1 0 0 1 0 1 1 0 1 1 0 0 0 1 1 0

  28. Lets Assign Codes Coding Table • Traverse the tree • Any left edge  add label 0 • As right edge  add label 1 • The code for each character is its root-to-leaf label sequence

  29. Huffman Algorithm • Step 1: Get Frequencies • Scan the file to be compressed and count the occurrence of each character • Sort the characters based on their frequency • Step 2: Build Tree & Assign Codes • Build a Huffman-code tree (binary tree) • Traverse the tree to assign codes • Step 3: Encode (Compress) • Scan the file again and replace each character by its code • Step 4: Decode (Decompress) • Huffman tree is the key to decompess the file

  30. Step 3: Encode (Compress) The File Coding Table Input File: Eerie eyes seen near lake. + Generate the encoded file 0000 10 1100 0001 10 …. Notice that no code is prefix to any other code  Ensures the decoding will be unique (Unlike Slide 8)

  31. Step 4: Decode (Decompress) • Must have the encoded file + the coding tree • Scan the encoded file • For each 0  move left in the tree • For each 1  move right • Until reach a leaf node  Emit that character and go back to the root 0000 10 1100 0001 10 …. + Generate the original file Eerie …

  32. Huffman Algorithm • Step 1: Get Frequencies • Scan the file to be compressed and count the occurrence of each character • Sort the characters based on their frequency • Step 2: Build Tree & Assign Codes • Build a Huffman-code tree (binary tree) • Traverse the tree to assign codes • Step 3: Encode (Compress) • Scan the file again and replace each character by its code • Step 4: Decode (Decompress) • Huffman tree is the key to decompess the file

More Related