Algorithm programming 1 89 210 some topics in compression
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

Algorithm Programming 1 89-210 Some Topics in Compression PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on
  • Presentation posted in: General

Algorithm Programming 1 89-210 Some Topics in Compression. Bar-Ilan University 2006-2007 תשס"ז by Moshe Fresko. Huffman Coding. Variable-length encoding Works on probabilities of symbols (characters, words, etc.) Build a tree Get two least frequent symbols/nodes

Download Presentation

Algorithm Programming 1 89-210 Some Topics in Compression

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Algorithm programming 1 89 210 some topics in compression

Algorithm Programming 189-210Some Topics in Compression

Bar-Ilan University

2006-2007 תשס"ז

by Moshe Fresko


Huffman coding

Huffman Coding

  • Variable-length encoding

  • Works on probabilities of symbols (characters, words, etc.)

  • Build a tree

    • Get two least frequent symbols/nodes

    • Join them into a parent node

    • Parent node’s frequency is sum of child nodes’

    • Continue until the tree contains all nodes and symbols

    • The path of a leaf indicates its code

  • Frequent symbols are near the root giving them short codes


Algorithm programming 1 89 210 some topics in compression

LZ77

  • Introduced in 1977 by Abraham Lempel and Jacob Ziv

  • Dictionary based

  • Works in a window size n

  • Decoding is easy and fast (but not Encoding)

  • Produces a list of tuples (Pos,Len,C)

    • Pos : Position backwards from the current position

    • Len : Number of symbols to be taken

    • C : Next character


Algorithm programming 1 89 210 some topics in compression

LZ77

  • Based on strings that repeat themselves

    An outcry in Spain is an outcry in vain

    An outcry in Spa(6,3)is a(22,12)v(21,3)

    aaaaaaaaaa

    a(1,9)


Lz77 example

LZ77 - Example

  • Window size : 5

  • ABBABCABBBBC

    NextSeqCode

    A(0,0,A)

    B(0,0,B)

    BA(1,1,A)

    BC(3,1,C)

    ABB(3,2,B)

    BBC(2,2,C)


Lz77 some variations

LZ77 - Some Variations

  • LZSS - A flag bit for distinguishing pointers from the other items.

  • LZR - No limit on the pointer size.

  • LZH - Compress the pointers in Huffman coding.


Algorithm programming 1 89 210 some topics in compression

LZ78

  • Instead of a window to previously seen text, a dictionary of phrases will be build

  • Both encoding and decoding are simple

    • From the current position in the text, find the longest phrase that is found in the dictionary

    • Output the pair (Index,NextChar)

      • Index : The dictionary phrase of that index

      • NextChar : The next character after that phrase

    • Add to the dictionary the new phrase by appending the next character


Lz78 example

LZ78 - Example

  • ABBABCABBBBC

    InputOutputAdd to dictionary

    A(0,A)1 = “A”

    B(0,B)2 = “B”

    BA(2,A)3 = “BA”

    BC(2,C)4 = “BC”

    AB(1,B)5 = “AB”

    BB(2,B)6 = “BB”

    BC(4,EOLN)

  • Dictionary size


Algorithm programming 1 89 210 some topics in compression

LZW

  • Produces only a list of dictionary entry indexes

  • Encoding

    • Starts with initial dictionary

      • For example, possible ascii characters (0..255)

    • From the input, find the longest string that exists in the dictionary

    • Output this string’s index in the dictionary

    • Append the next character in the input to that string and add it into the dictionary

    • Continue from that character on from (2)


Lzw example

LZW - Example

  • ABBABCABBBBC

    • Initial dictionary 0=“A”, 1=“B”, 2=“C”

      InputNextCharOutputAdd to dictionary

      AB03 = “AB”

      BB14 = “BB”

      BA15 = “BA”

      ABC36 = “ABC”

      CA27 = “CA”

      ABB38 = “ABB”

      BBB49 = “BBB”

      BC110 = “BC”

      C-2-

  • Dictionary size : ?


Lzw encoding example

LZW – Encoding Example

  • T=ababcbababaaaaaaa

  • Initial Dictionary Entries :1=a2=b3=c

    InputOutput NextSymbolAdd To Dictionary

    a 1b 4 = ab

    b 2a 5 = ba

    ab 4c 6 = abc

    c 3b 7 = cb

    ba 5b 8 = bab

    bab 8a 9 = baba

    a 1a10 = aa

    aa10a11 = aaa

    aaa11a12= aaaa

    a 1- -


Lzw encoding algorithm

LZW – Encoding Algorithm

w = Empty

while ( read next symbol k ) {

if wk exists in the dictionary

w = wk

else

add wk to the dictionary;

output the code for w;

w = k;

}


Lzw decoding algorithm

LZW – Decoding Algorithm

read a code k

output dictionary entry for k

w = k

while ( read a code k ) {

entry = dictionary entry for k

output entry

add w + entry[0] to dictionary

w = entry

}


Lzw decoding

LZW – Decoding

  • There is a special case problem with the previous algorithm

    • It can be confronted on every decoding process of a big file

    • It is the case where the index number read is not in the dictionary yet

    • Example : ABABABA

    • Initially : A=1,B=2

    • Output=1 2 3 5

    • In decoding above algorithm will not find the dictionary entry ABA=5

    • An additional small check will solve the problem

      • Be careful to do it in the Exercise 3


Lzw dictionary length

LZW – Dictionary Length

  • Dictionary length

    • Typically : 14 bits = 16384 entries (first 256 of them are single bytes)

    • What if we are out of dictionary length

      • Don’t add to the dictionary any more

      • Delete the whole dictionary (This will be used in the exercise)

      • LRU : Throw those that are not used recently

      • Monitor performance, and flush dictionary when the performance is poor.

      • Double the dictionary size


Exercise 3 compression algorithms

Exercise 3 – Compression Algorithms

  • Define an interface “Compression” in “Compression.java” as:interface Compression {void compress(InputStream is, OutputStream os) throws IOException ;void decompress(InputStream is, OutputStream os) throws IOException ;void stop() ;}// Note that these definitions are as general as possible to let flexibility in the input and output media chosen.

  • Define the LZW compression algorithm in file LZW.java

  • This class must implement the above Compress interface.


Exercise 3 compression algorithms1

Exercise 3 – Compression Algorithms

  • Write an application program “main()” in “LZW.java” that will run in the following form java LZW compress/decompress InputFile OutputFileIt will run the given Algorithm for compressing/decompressing the given input file into the given output file.

  • Example:

    java LZW compress MyTextFile.txt MySmallFile.lzw

    will compress MyTextFile.txt into MySmallFile.lzw

    java LZW decompress MySmallFile.lzw MyNewTextFile.txt

    will uncompress the given file into a new file


Exercise 3 compression algorithms2

Exercise 3 – Compression Algorithms

  • You can use the following two classes to keep a dynamic list and an associative array

    • ArrayList (with interface List)

    • HashMap (with interface Map)

  • Example:

    // For keeping a dynamic list

    List myList = new ArrayList() ;

    // For keeping an associative array

    Map myMap = new HashMap() ;

  • List

    • boolean add(Object)

    • int size()

    • Object get(int)

  • Map

    • Object put(Object key,Object value)

    • boolean containsKey(Object key)

    • Object get(Object key)

  • For more information, look at the Java API specification page.


Exercise 3 compression algorithms3

Exercise 3 – Compression Algorithms

  • Algorithm Parameters:

    • LZW: The maximal initial Dictionary size is 512 entries which requires 9 bits for writing. First 256 entries are the ascii characters.

    • After reaching the end of the dictionary it will double the dictionary size. Which means from that point on the representation of an entry will be 10 bits and will be a dictionary of max size 1024.

    • This will continue until 14 bits. After reaching 214 entries, it will clean the dictionary by starting from the beginning with 9-bit represenations.

    • Please take the two numbers 9 and 14 (start Bit Count, last Bit Count) as a parameter in the constructor. The default constructor will start them with 9 and 16.

      • LZW() { this(9,14); }

      • LZW(int startNumOfBits, int endNumOfBits) { /*Initialization*/ }

  • You have to use BitInputStream/BitOutputStream from the 1st Exercise to write the bits into an output or to read from input.


Exercise 2 compression algorithms

Exercise # 2Compression Algorithms

  • Important:

    • You submit via submitex.

      • Course number: 89-210

      • Exercise: ex2

      • Two files will be delivered. Compression.java and LZW.java

    • It must compile/work under Java 1.4/1.5. You can try it on the Unix environment (on Sunshine).

    • Do not write debugging information to the console or any other place.

    • Write enough comments. Write wherever you think there is a need for the understanding of the code.

    • Write your code according to the OOP principles.

    • Everybody must do it alone.

  • Deadline:

    • 14 Dec 2006


  • Login