dictionary techniques n.
Skip this Video
Loading SlideShow in 5 Seconds..
Dictionary Techniques PowerPoint Presentation
Download Presentation
Dictionary Techniques

Dictionary Techniques

275 Views Download Presentation
Download Presentation

Dictionary Techniques

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Dictionary Techniques • Split the input into classes, frequently and infrequently occurring. • Keep a list , or DICTIONARY,of frequently occurring patterns and encode them with a reference to the dictionary. • Encode others less efficiently

  2. Dictionary techniques • The size of the dictionary must be much smaller that the number of all possible patterns. • Useful with sources that generate a relatively small number of patterns, such as text sources and computer commands. • Effective for skewed alphabets

  3. Dictionary techniques Depending upon how much knowledge is available, there are static and adaptive dictionary techniques.

  4. Static Dictionary • The dictionary is permanent (or allowing addition, but not deletion) • Application-specific, or data specific Example-1: Digram Coding for text compression be, th, ie, ch, sh, ar, or, en,…..

  5. Example of static dictionary Let the source alphabet A={a,b,….z,., ,,!,?, :, ;} of size 32 • For 4-character words, there are 324=220 patterns. • Thus, Fixed–length coding needs 20 b/word.

  6. Example of static dictionary • Put 256 =28 most frequently occurring patterns into a dictionary • If a pattern is in the dictionary • (1-bit flag)+(8-bit index)=9 bits • else • (1-bit flag)+(20-bit code)= 21 bits

  7. Example of static dictionary • L =9p+21(1-p)=21-12p bits/word , where p is the probability of a pattern in a dictionary • L < 20, if p>0.0833 • p is to be skewed to get high compression

  8. Adaptive Dictionary-Based Techniques Jacob Ziv, Abraham Lempel LZ techniques And the contribution of TERRY WELSH LZW algorithm

  9. IDEA • Adapt to the characteristics of the source. • The dictionary is a portion of the previously encoded sequence. • Start with an empty dictionary. • Add entries as they are found in the input stream.

  10. LZ77 Search pointer W S Previously encoded sequence Next portion of a sequence Asearch pointer is moved back through the search buffer that contains a portion of the recently encoded to match a pattern, or a symbol in the look ahead buffer.

  11. LZ77 • The encoder searches the search buffer for the longest match pattern and sends Code=(Offset, Max_match_length,New_symbol) Where, Offset is a distance from the pointer to the found pattern. New Symbol is a code of a next symbol after the match pattern. Max_match_length – is a number of symbols in the string found in the search buffer and identical with those in the beginning of lookahead buffer

  12. Length • If the size of the source alphabet is A, then the number of bits needed to encode the triple using fixed-length codes is Log2 S + log2W+ log2A

  13. Example: • Search buffer of size 7, look-ahead buffer of size 6 • No match is found in the search buffer, so • <0,0,c(d)>

  14. Coding: cabracadabrarrarrad

  15. DECODING cabracadabrarrarrad

  16. Analysis of LZ77 • LZ77 assumes patterns in the input stream occur close together. • Any pattern that recurs over a period longer than the search buffer size will not be captured. • A better compression method would save frequently occurring patterns in the dictionary. • The size L of look-ahead buffer is limited • The size S of search buffer is limited

  17. Analysis of LZ77 • When increasing L (or S), longer matches would be possible, thus compression efficiency increases • But search for longer matches would reduce the speed. • When increasing the length of buffers, compression efficiency drops

  18. Improvements of LZ77 • To encode the triples using VLC, e.g. PKZIP, ZIP, LHarc, PNG, ARJ, Winzip LZSS • Encode two fields instead of three • Use a flag bit to indicate whether what follows is the codeword for a new symbol. • For example 0- for single characters • 1-for triples

  19. LZSS- example