1 / 26

Huffman Compression Algorithm: Efficient Data Compression Technique

Huffman compression is a data encoding process that reduces the number of bits needed, saving valuable resources like bandwidth and disk space. This algorithm assigns shorter codes to frequently occurring characters, optimizing compression ratio.

nbeck
Download Presentation

Huffman Compression Algorithm: Efficient Data Compression Technique

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Data Compression Algorithm:Huffman Compression Gordon College

  2. Compression • Definition: process of encoding which uses fewer bits • Reason: to save valuable resources such as communication bandwidth or hard disk space Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Compress Uncompress Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa

  3. Compression Types • Lossy • Loses some information during compression which means the exact original can not be recovered (jpeg) • Normally provides better compression • Used when loss is acceptable - image, sound, and video files

  4. Compression Types • Lossless • exact original can be recovered • usually exploit statistical redundancy • Used when loss is not acceptable - data Basic Term: Compression Ratio - ratio of the number of bits in original data to the number of bits in compressed data For example: 3:1 is when the original file was 3000 bytes and the compression file is now only 1000 bytes.

  5. Variable-Length Codes • Recall that ASCII, EBCDIC, and Unicode use same size data structure for all characters • Contrast Morse code • Uses variable-length sequences • The Huffman Compression is a variable-length encoding scheme

  6. Variable-Length Codes • Each character in such a code • Has a weight (probability) and a length • The expected length is the sum of the products of the weights and lengths for all the characters 0.2 x 2 + 0.1 x 4 + 0.1 x 4 + 0.15 x 3 + 0.45 x 1 = 2.1 Goal minimize the expected length

  7. Huffman Compression • Uses prefix codes (sequence of optimal binary codes) • Uses a greedy algorithm - looks at the data at hand and makes a decision based on the data at hand.

  8. Huffman Compression Basic algorithm • Generates a table that contains the frequency of each character in a text. • Using the frequency table - assign each character a “bit code” (a sequence of bits to represent the character) • Write the bit code to the file instead of the character.

  9. Immediate Decodability • Definition: When no sequence of bits that represents a character is a prefix of a longer sequence for another character Purpose: Can be decoded without waiting for remaining bits • Coding scheme to the right is not immediately decodable • However this one is

  10. Huffman Compression • Huffman (1951) • Uses frequencies of symbols in a string to build a variable rate prefix code. • Each symbol is mapped to a binary string. • More frequent symbols have shorter codes. • No code is a prefix of another. Not Huffman Codes

  11. Huffman Codes • We seek codes that are • Immediately decodable • Each character has minimal expected code length • For a set of n characters { C1 .. Cn } with weights { w1 .. wn } • We need an algorithm which generates n bit strings representing the codes

  12. Cost of a Huffman Tree • Let p1, p2, ... , pm be the probabilities for the symbols a1, a2, ... ,am, respectively. • Define the cost of the Huffman tree T to be where ri is the length of the path from the root to ai. • HC(T) is the expected length of the code of a symbol coded by the tree T. HC(T) is the bit rate of the code.

  13. Example of Cost • Example: a 1/2, b 1/8, c 1/8, d 1/4 HC(T) = 1 x 1/2 + 3 x 1/8 + 3 x 1/8 + 2 x 1/4 = 1.75 a b c d

  14. Huffman Tree • Input: Probabilities p1, p2, ... , pm for symbols a1, a2, ... ,am, respectively. • Output: A tree that minimizes the average number of bits (bit rate) to code a symbol. That is, minimizes where ri is the length of the path from the root to ai. This is a Huffman tree or Huffman code.

  15. Recursive Algorithm - Huffman Codes • Initialize list of none-node binary trees containing a weight for each character • Repeat the following n – 1 times: a. Find two trees T' and T" in list with minimal weights w' and w"b. Replace these two trees with a binary tree whose root is w' + w" and whose subtrees are T' and T"and label points to these subtrees 0 and 1

  16. Huffman's Algorithm • The code for character Ci is the bit string labeling a path in the final binary tree from the root to Ci Given characters The end with codes result is

  17. Huffman Decoding Algorithm • Initialize pointer p to root of Huffman tree • While end of message string not reached repeat the following:a. Let xbe next bit in stringb. if x = 0 set p equal to left child pointer else set p to right child pointerc. If p points to leaf i. Display character with that leaf ii. Reset p to root of Huffman tree

  18. Huffman Decoding Algorithm • For message string 0101011010 • Using Hoffman Tree and decoding algorithm Click for answer

  19. Iterative Huffman TreeAlgorithm • Form a node for each symbol ai with weight pi; • Insert the nodes in a min priority queue ordered by probability; • While the priority queue has more than one element do • min1 := delete-min; • min2 := delete-min; • create a new node n; • n.weight := min1.weight + min2.weight; • n.left := min1; also associate this link with bit 0 • n.right := min2; also associate this link with bit 1 • insert(n) • Return the last node in the priority queue.

  20. Example of Huffman TreeAlgorithm (1) • P(a) =.4, P(b)=.1, P(c)=.3, P(d)=.1, P(e)=.1

  21. Example of Huffman TreeAlgorithm (2)

  22. Example of Huffman TreeAlgorithm (3)

  23. Example of Huffman TreeAlgorithm (4)

  24. Huffman Code

  25. In class example • I will praise you and I will love you Lord Index Sym Freq 0 space 9 1 I 2 2 L 1 3 a 2 4 d 2 5 e 2 6 i 3 7 l 5 8 n 1 9 o 4 10 p 1 11 r 2 12 s 1 13 u 2 14 v 1 15 w 2 16 y 2

  26. In class example • I will praise you and I will love you Lord Index Sym Freq Parent Left Right Nbits Bits 0 space 9 30 -1 -1 2 01 1 I 2 23 -1 -1 5 11010 2 L 1 17 -1 -1 5 00010 3 a 2 20 -1 -1 5 11110 4 d 2 22 -1 -1 5 11101 5 e 2 21 -1 -1 4 0000 6 i 3 25 -1 -1 4 1100 7 l 5 28 -1 -1 3 101 8 n 1 17 -1 -1 5 00011 9 o 4 26 -1 -1 3 001 10 p 1 18 -1 -1 6 100110 11 r 2 23 -1 -1 5 11011 12 s 1 18 -1 -1 6 100111 13 u 2 24 -1 -1 4 1000 14 v 1 19 -1 -1 5 10010 15 w 2 20 -1 -1 5 11111 16 y 2 22 -1 -1 5 11100

More Related