Loading in 2 Seconds...

Algorithm Programming 1 89-210 Some Topics in Compression

Loading in 2 Seconds...

- By
**duard** - Follow User

- 92 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Algorithm Programming 1 89-210 Some Topics in Compression' - duard

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Algorithm Programming 189-210Some Topics in Compression

Bar-Ilan University

2007-2008 תשס"ח

by Moshe Fresko

Huffman Coding

- Variable-length encoding
- Works on probabilities of symbols (characters, words, etc.)
- Build a tree
- Get two least frequent symbols/nodes
- Join them into a parent node
- Parent node’s frequency is sum of child nodes’
- Continue until the tree contains all nodes and symbols
- The path of a leaf indicates its code
- Frequent symbols are near the root giving them short codes

LZ77

- Introduced in 1977 by Abraham Lempel and Jacob Ziv
- Dictionary based
- Works in a window size n
- Decoding is easy and fast (but not Encoding)
- Produces a list of tuples (Pos,Len,C)
- Pos : Position backwards from the current position
- Len : Number of symbols to be taken
- C : Next character

LZ77

- Based on strings that repeat themselves

An outcry in Spain is an outcry in vain

An outcry in Spa(6,3)is a(22,12)v(21,3)

aaaaaaaaaa

a(1,9)

LZ77 - Example

- Window size : 5
- ABBABCABBBBC

NextSeqCode

A (0,0,A)

B (0,0,B)

BA (1,1,A)

BC (3,1,C)

ABB (3,2,B)

BBC (2,2,C)

LZ77 - Some Variations

- LZSS - A flag bit for distinguishing pointers from the other items.
- LZR - No limit on the pointer size.
- LZH - Compress the pointers in Huffman coding.

LZ78

- Instead of a window to previously seen text, a dictionary of phrases will be build
- Both encoding and decoding are simple
- From the current position in the text, find the longest phrase that is found in the dictionary
- Output the pair (Index,NextChar)
- Index : The dictionary phrase of that index
- NextChar : The next character after that phrase
- Add to the dictionary the new phrase by appending the next character

LZ78 - Example

- ABBABCABBBBC

Input Output Add to dictionary

A (0,A) 1 = “A”

B (0,B) 2 = “B”

BA (2,A) 3 = “BA”

BC (2,C) 4 = “BC”

AB (1,B) 5 = “AB”

BB (2,B) 6 = “BB”

BC (4,EOLN)

- Dictionary size

LZW

- Produces only a list of dictionary entry indexes
- Encoding
- Starts with initial dictionary
- For example, possible ascii characters (0..255)
- From the input, find the longest string that exists in the dictionary
- Output this string’s index in the dictionary
- Append the next character in the input to that string and add it into the dictionary
- Continue from that character on from (2)

LZW - Example

- ABBABCABBBBC
- Initial dictionary 0=“A”, 1=“B”, 2=“C”

Input NextChar Output Add to dictionary

A B 0 3 = “AB”

B B 1 4 = “BB”

B A 1 5 = “BA”

AB C 3 6 = “ABC”

C A 2 7 = “CA”

AB B 3 8 = “ABB”

BB B 4 9 = “BBB”

B C 1 10 = “BC”

C - 2 -

- Dictionary size : ?

LZW – Encoding Example

- T=ababcbababaaaaaaa
- Initial Dictionary Entries :1=a 2=b 3=c

Input Output NextSymbol Add To Dictionary

a 1 b 4 = ab

b 2 a 5 = ba

ab 4 c 6 = abc

c 3 b 7 = cb

ba 5 b 8 = bab

bab 8 a 9 = baba

a 1 a 10 = aa

aa 10 a 11 = aaa

aaa 11 a 12= aaaa

a 1 - -

LZW – Encoding Algorithm

w = Empty

while ( read next symbol k ) {

if wk exists in the dictionary

w = wk

else

add wk to the dictionary;

output the code for w;

w = k;

}

LZW – Decoding Algorithm

read a code k

output dictionary entry for k

w = k

while ( read a code k ) {

entry = dictionary entry for k

output entry

add w + entry[0] to dictionary

w = entry

}

LZW – Decoding

- There is a special case problem with the previous algorithm
- It can be confronted on every decoding process of a big file
- It is the case where the index number read is not in the dictionary yet
- Example : ABABABA
- Initially : A=1,B=2
- Output=1 2 3 5
- In decoding above algorithm will not find the dictionary entry ABA=5
- An additional small check will solve the problem
- Be careful to do it in the Exercise 3

LZW – Dictionary Length

- Dictionary length
- Typically : 14 bits = 16384 entries (first 256 of them are single bytes)
- What if we are out of dictionary length
- Don’t add to the dictionary any more
- Delete the whole dictionary (This will be used in the exercise)
- LRU : Throw those that are not used recently
- Monitor performance, and flush dictionary when the performance is poor.
- Double the dictionary size

Download Presentation

Connecting to Server..