Algorithm programming 1 89 210 some topics in compression
Download
1 / 15

Algorithm Programming 1 89-210 Some Topics in Compression - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

Algorithm Programming 1 89-210 Some Topics in Compression. Bar-Ilan University 2007-2008 תשס"ח by Moshe Fresko. Huffman Coding. Variable-length encoding Works on probabilities of symbols (characters, words, etc.) Build a tree Get two least frequent symbols/nodes

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Algorithm Programming 1 89-210 Some Topics in Compression' - duard


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Algorithm programming 1 89 210 some topics in compression

Algorithm Programming 189-210Some Topics in Compression

Bar-Ilan University

2007-2008 תשס"ח

by Moshe Fresko


Huffman coding
Huffman Coding

  • Variable-length encoding

  • Works on probabilities of symbols (characters, words, etc.)

  • Build a tree

    • Get two least frequent symbols/nodes

    • Join them into a parent node

    • Parent node’s frequency is sum of child nodes’

    • Continue until the tree contains all nodes and symbols

    • The path of a leaf indicates its code

  • Frequent symbols are near the root giving them short codes


LZ77

  • Introduced in 1977 by Abraham Lempel and Jacob Ziv

  • Dictionary based

  • Works in a window size n

  • Decoding is easy and fast (but not Encoding)

  • Produces a list of tuples (Pos,Len,C)

    • Pos : Position backwards from the current position

    • Len : Number of symbols to be taken

    • C : Next character


LZ77

  • Based on strings that repeat themselves

    An outcry in Spain is an outcry in vain

    An outcry in Spa(6,3)is a(22,12)v(21,3)

    aaaaaaaaaa

    a(1,9)


Lz77 example
LZ77 - Example

  • Window size : 5

  • ABBABCABBBBC

    NextSeqCode

    A (0,0,A)

    B (0,0,B)

    BA (1,1,A)

    BC (3,1,C)

    ABB (3,2,B)

    BBC (2,2,C)


Lz77 some variations
LZ77 - Some Variations

  • LZSS - A flag bit for distinguishing pointers from the other items.

  • LZR - No limit on the pointer size.

  • LZH - Compress the pointers in Huffman coding.


LZ78

  • Instead of a window to previously seen text, a dictionary of phrases will be build

  • Both encoding and decoding are simple

    • From the current position in the text, find the longest phrase that is found in the dictionary

    • Output the pair (Index,NextChar)

      • Index : The dictionary phrase of that index

      • NextChar : The next character after that phrase

    • Add to the dictionary the new phrase by appending the next character


Lz78 example
LZ78 - Example

  • ABBABCABBBBC

    Input Output Add to dictionary

    A (0,A) 1 = “A”

    B (0,B) 2 = “B”

    BA (2,A) 3 = “BA”

    BC (2,C) 4 = “BC”

    AB (1,B) 5 = “AB”

    BB (2,B) 6 = “BB”

    BC (4,EOLN)

  • Dictionary size


LZW

  • Produces only a list of dictionary entry indexes

  • Encoding

    • Starts with initial dictionary

      • For example, possible ascii characters (0..255)

    • From the input, find the longest string that exists in the dictionary

    • Output this string’s index in the dictionary

    • Append the next character in the input to that string and add it into the dictionary

    • Continue from that character on from (2)


Lzw example
LZW - Example

  • ABBABCABBBBC

    • Initial dictionary 0=“A”, 1=“B”, 2=“C”

      Input NextChar Output Add to dictionary

      A B 0 3 = “AB”

      B B 1 4 = “BB”

      B A 1 5 = “BA”

      AB C 3 6 = “ABC”

      C A 2 7 = “CA”

      AB B 3 8 = “ABB”

      BB B 4 9 = “BBB”

      B C 1 10 = “BC”

      C - 2 -

  • Dictionary size : ?


Lzw encoding example
LZW – Encoding Example

  • T=ababcbababaaaaaaa

  • Initial Dictionary Entries :1=a 2=b 3=c

    Input Output NextSymbol Add To Dictionary

    a 1 b 4 = ab

    b 2 a 5 = ba

    ab 4 c 6 = abc

    c 3 b 7 = cb

    ba 5 b 8 = bab

    bab 8 a 9 = baba

    a 1 a 10 = aa

    aa 10 a 11 = aaa

    aaa 11 a 12= aaaa

    a 1 - -


Lzw encoding algorithm
LZW – Encoding Algorithm

w = Empty

while ( read next symbol k ) {

if wk exists in the dictionary

w = wk

else

add wk to the dictionary;

output the code for w;

w = k;

}


Lzw decoding algorithm
LZW – Decoding Algorithm

read a code k

output dictionary entry for k

w = k

while ( read a code k ) {

entry = dictionary entry for k

output entry

add w + entry[0] to dictionary

w = entry

}


Lzw decoding
LZW – Decoding

  • There is a special case problem with the previous algorithm

    • It can be confronted on every decoding process of a big file

    • It is the case where the index number read is not in the dictionary yet

    • Example : ABABABA

    • Initially : A=1,B=2

    • Output=1 2 3 5

    • In decoding above algorithm will not find the dictionary entry ABA=5

    • An additional small check will solve the problem

      • Be careful to do it in the Exercise 3


Lzw dictionary length
LZW – Dictionary Length

  • Dictionary length

    • Typically : 14 bits = 16384 entries (first 256 of them are single bytes)

    • What if we are out of dictionary length

      • Don’t add to the dictionary any more

      • Delete the whole dictionary (This will be used in the exercise)

      • LRU : Throw those that are not used recently

      • Monitor performance, and flush dictionary when the performance is poor.

      • Double the dictionary size


ad