1 / 19

Dictionary Techniques

Introduction. Huffman codes, arithmetic codesassume a sequence of independent symbolsDictionary methodsidentify frequent (and infrequent) occurring patternsencode with different methods. Example. A32 = {26 lowercase letters , . ! ? ; :}Uncompressed case4-symbol block => 5*4 = 20 bits per bloc

hila
Download Presentation

Dictionary Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Dictionary Techniques Mei-Chen Yeh

    2. Introduction Huffman codes, arithmetic codes assume a sequence of independent symbols Dictionary methods identify frequent (and infrequent) occurring patterns encode with different methods

    3. Example A32 = {26 lowercase letters , . ! ? ; :} Uncompressed case 4-symbol block => 5*4 = 20 bits per block Dictionary method: select 256 most frequent patterns 9p + 21(1-p) = 21 – 12p p: P (encounter a pattern from the dictionary) 21 – 12p < 20 => p > 0.084!

    4. Static Dictionary Application-specific Digram Coding stores letter pairs Example: A = {a, b, c, d, r} Encode abracadabra

    6. LZ77 (1977) LZ78 (1978) LZW: UNIX compress, GIF Adaptive Dictionary

    7. LZ77 <o, l, c> o: the distance of the pointer from the look-ahead buffer l: length of match c: codeword for the symbol in the look-ahead buffer that follows the match The number of bits required to code the triplet is ? Why encode c? in case there is no match in the search buffer!Why encode c? in case there is no match in the search buffer!

    8. LZ77: Encode

    9. LZ77: Decode

    10. LZ78 Worst-case situation for LZ77 No search buffer, instead, build a dictionary: <o, l> ? i <i, c> i: index in the dictionary c: codeword for the symbol that follows the matched portion

    11. LZ78: Example Monster song from the sesame streetMonster song from the sesame street

    12. In case the dictionary is full… Freeze Delete the least used items Progressively doubled Erase the dictionary (reset)

    13. Variation on LZ78: LZW (Encode)

    14. LZW (Decode)

    15. Applications (1) UNIX compress command Based on LZW Adaptive dictionary size 512 in the beginning (9 bits for transmitting an index) Double the size if filled up (512 ? 1024 ? 2048…) If the maximal size is achieved, flush the dictionary or do nothing (a static dictionary) depending on the compression ratio

    16. Applications (2) The Graphics Interchange Format (GIF) Graphical images First byte: #bits b per pixel in the image Example: 8 for grayscale images Clear code: the binary number 2b Reset the compression/decompression parameters Initial diction size: 2b+1 Doubled when filled up, until reaching 4096, and becomes a static dictionary

    17. Applications (3)

    18. Dennis Ritchie Sep. 9, 1941 – Oct. 12, 2011 The inventor of Unix and C Received the Turing Award in 1983 Co-wrote the book “The C Programming Language” Steve Jobs, who died Oct. 5, 2011 He named his creation C because programming language that came before it was called B. Ph.D. in Harvard, worked in Bell Lab for over four decadesSteve Jobs, who died Oct. 5, 2011 He named his creation C because programming language that came before it was called B. Ph.D. in Harvard, worked in Bell Lab for over four decades

    19. Dennis Ritchie Sep. 9, 1941 – Oct. 12, 2011 Quotes C is quirky, flawed, and an enormous success. UNIX is very simple, it just needs a genius to understand its simplicity.

More Related