1 / 16

Bits and Huffman Encoding

Bits and Huffman Encoding. Please get a piece of paper and a pen and put your name and netid on it. Make sure you can turn in it after class without losing your notes. How is data stored in a computer?. Several different ways are possible

dorie
Download Presentation

Bits and Huffman Encoding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bits and Huffman Encoding Please get a piece of paper and a pen and put your name and netid on it. Make sure you can turn in it after class without losing your notes.

  2. How is data stored in a computer? • Several different ways are possible • In an ASCII-encoded text file (hithero known as a “text file”), each character consists of 8-bit chunks • What’s a bit?

  3. Bits Mean What You Want Them To Mean 01001101 01001001 01001011 01000101 K M I E But why 8-bit chunks? Couldn’t we interpret the same string as 4 bit chunks? 0100 1101 0100 1001 0100 1011 0100 0101 4 13 4 9 4 11 4 5 With 8 bit chucks there are 28 (256) possible characters. With 4 bit chunks there are 24 (16) possible characters.

  4. What if we want to save memory? • Could just use fewer bits per chunk, but that limits the number of characters we can represent • But my document contains a lot more ‘a’s than ‘%’s. What if the code for a was “01” and the code for % was “10000101”

  5. A Problem Mike decides he wants to compress his vast collection of Harry Potter fanfic (all stored a textfiles). He notices that the character ‘H’ occurs much more frequently in the textfiles than does the character ‘%’, but they both use 8 bits. So he changes the encoding so that ‘H’ is represented by ‘01’ (2 bits) and ‘%’ is represented by ‘1010101010101010’ (16 bits). Assuming all other characters stay at 8 bits (a bit unlikely…but just pretend)…how much more frequent do ‘H’s need to be than ‘%’s for there to be a space savings? • There must be more than 3Hs for every %s • There must be more than 3Hs for every 2%s • There must be more than 4Hs for every 3%s • As long as there more Hs than %s you get a savings

  6. We can get savings if we know what characters occur frequently • So imagine you look at some set of data with character frequencies. How much can you save?

  7. Consider the string: • AACCCAAABAADAE What’s the best encoding?

  8. Why will this encoding not work? What is the encoding for AAE? What is the encoding for ADC?

  9. Consider the string: • AACCCAAABAADAE The rule: one character’s encoding cannot be the prefix of another. So if A=01, no other encoding can begin with 01 Work with your row to find the most efficient possible encoding for each character. I was able to encode the entire string in 25 bits. See if you can do it that efficiently – but be careful with prefixes. Write your encoding, plus your computation of its total length, on your handin sheet.

  10. My Encoding • AACCCAAABAADAE The rule: one character’s encoding cannot be the prefix of another. So if A=01, no other encoding can begin with 01 There a several possible encodings that can give you 25. I’m pretty sure there’s no more efficient encoding, at least as far as compressing individual characters.

  11. Given a set of frequencies, how should you select the character encoding for maximum efficiency?Huffman Encoding

  12. The basic idea • We will make a “huffman tree”. By repeatedly combining nodes with small frequencies into “meta nodes” with larger frequency • A letter’s position in the tree will tell us what its encoded form is

  13. Generate a huffman tree and encodings for this circumstance Follow the tutorial linked off the resources section in Sakai Write the huffman tree and the resultant encoding on your handin sheet.

  14. The Header • The Magic Number • Counts • Or a tree for extra credit

  15. The Psuedo-EOF character • A made up character to write to your file

  16. Please turn your sheet in at the back of the room (try for a neat pile)

More Related