Dictionary Techniques

# Dictionary Techniques

## Dictionary Techniques

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Dictionary Techniques • Split the input into classes, frequently and infrequently occurring. • Keep a list , or DICTIONARY,of frequently occurring patterns and encode them with a reference to the dictionary. • Encode others less efficiently

2. Dictionary techniques • The size of the dictionary must be much smaller that the number of all possible patterns. • Useful with sources that generate a relatively small number of patterns, such as text sources and computer commands. • Effective for skewed alphabets

3. Dictionary techniques Depending upon how much knowledge is available, there are static and adaptive dictionary techniques.

4. Static Dictionary • The dictionary is permanent (or allowing addition, but not deletion) • Application-specific, or data specific Example-1: Digram Coding for text compression be, th, ie, ch, sh, ar, or, en,…..

5. Example of static dictionary Let the source alphabet A={a,b,….z,., ,,!,?, :, ;} of size 32 • For 4-character words, there are 324=220 patterns. • Thus, Fixed–length coding needs 20 b/word.

6. Example of static dictionary • Put 256 =28 most frequently occurring patterns into a dictionary • If a pattern is in the dictionary • (1-bit flag)+(8-bit index)=9 bits • else • (1-bit flag)+(20-bit code)= 21 bits

7. Example of static dictionary • L =9p+21(1-p)=21-12p bits/word , where p is the probability of a pattern in a dictionary • L < 20, if p>0.0833 • p is to be skewed to get high compression

8. Adaptive Dictionary-Based Techniques Jacob Ziv, Abraham Lempel LZ techniques And the contribution of TERRY WELSH LZW algorithm

9. IDEA • Adapt to the characteristics of the source. • The dictionary is a portion of the previously encoded sequence. • Start with an empty dictionary. • Add entries as they are found in the input stream.

10. LZ77 Search pointer W S Previously encoded sequence Next portion of a sequence Asearch pointer is moved back through the search buffer that contains a portion of the recently encoded to match a pattern, or a symbol in the look ahead buffer.

11. LZ77 • The encoder searches the search buffer for the longest match pattern and sends Code=(Offset, Max_match_length,New_symbol) Where, Offset is a distance from the pointer to the found pattern. New Symbol is a code of a next symbol after the match pattern. Max_match_length – is a number of symbols in the string found in the search buffer and identical with those in the beginning of lookahead buffer

12. Length • If the size of the source alphabet is A, then the number of bits needed to encode the triple using fixed-length codes is Log2 S + log2W+ log2A

13. Example: • Search buffer of size 7, look-ahead buffer of size 6 • No match is found in the search buffer, so • <0,0,c(d)>