1 / 17

CSC 2300 Data Structures & Algorithms

CSC 2300 Data Structures & Algorithms. April 27, 2007 Chap. 10. Algorithm Design Techniques. Today. File Compression Huffman Code. ASCII. What does ASCII stand for? The ASCII character set consists of about 100 “printable” characters. How many bits to represent these characters?

locke
Download Presentation

CSC 2300 Data Structures & Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 2300Data Structures & Algorithms April 27, 2007 Chap. 10. Algorithm Design Techniques

  2. Today • File Compression • Huffman Code

  3. ASCII • What does ASCII stand for? • The ASCII character set consists of about 100 “printable” characters. • How many bits to represent these characters? • The set includes some “nonprintable” characters. • An 8th bit is added as a parity bit.

  4. Example • A file with only the characters a, e, i, s, t, blankspace, newline. • There are seven characters, and so three bits are sufficient. • i see a seat • 010101011001001101000101011001000100110 (39 bits) • How to do better?

  5. Binary Tree • Binary tree: • The data reside only at the leaves. • Can you improve this representation?

  6. Example • newline becomes 11 • i see a seat • 01010101100100110100010101100100010011 (38 bits) • A reduction of 1 bit. • Want more significant improvement. • How?

  7. The Two Trees • What can you say about the structure of the better tree? • It a a full tree. • All nodes either are leaves or have two children. • An optimal code will always have this property. • Why? • Nodes with only one child can always move up one level.

  8. Prefix Code • If the characters are placed only at the leaves, the given sequence of bits can be decoded unambiguously. • Prefix code: no character code is a prefix of another character code. • Example: 01001111000010110001000111 • What is it? • is a tie

  9. Optimal Prefix Code • Binary tree: • How to find optimal code?

  10. Our Example • i see a seat • 1011000000101110011100000010010001 (34 bits) • The code in the table is not optimal for our example. • Why not? • Exercise. Find the optimal code for our example.

  11. Huffman’s Algorithm • Assume that there are C characters. • Maintain a forest of trees. • The weight of a tree is equal to the sum of the frequencies of its leaves. • For C – 1 times, select the two trees T1 and T2 of smallest weights, breaking ties arbitrarily, and form a new tree with subtrees T1 and T2. • At the beginning, there are C single-node trees. At the end, there is one single tree, which is the optimal Huffman coding tree.

  12. Example • Initial stage: • After first merge:

  13. Example • After first merge: • After second merge: • After third merge:

  14. Example • After third merge: • After fourth merge:

  15. Example • After fourth merge: • After fifth merge:

  16. Example • After fifth merge: • After final merge:

  17. Implementation • If we maintain the trees in a priority queue, ordered by weight, what is the running time? • O( C log C ). • We say that Huffman’s method is a two-pass algorithm. What are the two passes? • The first pass selects the frequency data and the second pass performs the encoding.

More Related