Adaptive dictionary
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

Adaptive Dictionary: PowerPoint PPT Presentation


  • 413 Views
  • Uploaded on
  • Presentation posted in: General

Adaptive Dictionary:. LZ and LZW. Adaptive Dictionary. In 1977 and 1978 two papers were published by Jacob Ziv and Abraham Lemple that would produce a compression scheme still widely used today (1977) LZ77 or LZ1 (1978) LZ78 or LZ2

Download Presentation

Adaptive Dictionary:

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Adaptive dictionary

Adaptive Dictionary:

LZ and LZW


Adaptive dictionary1

Adaptive Dictionary

  • In 1977 and 1978 two papers were published by Jacob Ziv and Abraham Lemple that would produce a compression scheme still widely used today

    • (1977) LZ77 or LZ1

    • (1978) LZ78 or LZ2

  • These techniques, and variations, are used in data compression

    • File Compression in UNIX (compress)

    • Image Compression (GIF – graphical Interchange format)

    • Compression over Modems – V.42 bis


Adaptive dictionary

LZ77

  • This algorithm is base on a portion of the previous encoded sequence

  • The encoder examines the input sequence through a sliding window

  • The sliding window consists of two parts

    • Search Buffer

    • Look-ahead Buffer


Adaptive dictionary

LZ77

  • Encoding Process

    • Move pointer back into search buffer in order to obtain a match of with the symbol to be encoded

      • Offset – distance from symbol to be encoded

    • Encoder then examines the symbol following the symbol to be encoded and the matching symbol in the search buffer to see if they match consecutive symbols in the look-ahead buffer

      • Length – number of consecutive symbols matching symbol to be encoded in the search buffer

    • The encoder stores the longest match and continues back through the search buffer in order to possibly find a longer length match


Adaptive dictionary

LZ77

  • Encoding Process (cont’d)

    • Once the search is complete, the encoder encodes the information to be sent with a triple

    • <o, l, c>

      • o – offset

      • l – length

      • c – codeword of the symbol following the match in the look-ahead buffer


Adaptive dictionary

LZ77

  • Encoding Process (cont’d)

    • Note: The reason the third element (c) is placed in the triple is to take care of the situation that no match was found in the search buffer (i.e. l = 0)

    • This may seem inefficient, sending a triple when we only need to encode ‘c’, however this situation is not common due to the actual size of the search buffers. (in practice the search buffers are much larger than the examples in this presentation)

    • The reason why this is done will become clear with an example


Adaptive dictionary

LZ77

  • Encoding Process (cont’d)

    • Let S represent the size of the search buffer

    • Let W represent the size of the entire window

    • Let A represent the size of the alphabet

    • Using fixed length codes, the triple is encoded using

      |‾ Log2 (S) ‾| + |‾ Log2 (W) ‾| + |‾ Log2 (A) ‾| bits

      • Note: |‾ x ‾| is the ceiling function

        • |‾ 3.5 ‾| = 4.0(ceiling function)

        • |_ 3.5 _| = 3.0(floor function)


Adaptive dictionary

LZ77

  • Encoding Process (cont’d)

    • The second term, Log2 (W), may seem a bit strange. It may, at first, seem as though the second term should be Log2 (S). However, the length of the match may extend into the look-ahead buffer. This will become clear in an example

    • There are 3 cases to consider in this algorithm

      • No match in the search buffer

      • There is a match within the search buffer

      • The match extends inside the look-ahead buffer

    • The following example outlines each of these cases


Adaptive dictionary

LZ77

  • Encoding Process (cont’d)

  • Example

    • Let W = 13, S = 7 (which implies the LAB = 6)

    • Suppose the sequence to be encoded is

      … cabracadabrarrarrad …

    • It can be seen that there is no match in the search buffer for ‘d’. Thus, we transmit the triple <0,0,C(d)>

    • Shift the window by 1 symbol

cabraca | dabrar


Adaptive dictionary

LZ77

  • Encoding Process (cont’d)

  • Example (cont’d)

    • A match is found at o = 2, l = 1

    • Another match is found at o = 4, l = 1

    • Another match is found at o = 7, l = 4

    • Thus, we encode the triple as <7, 4, C(r)>

    • Shift the window by 5 symbols

abracad | abrarr


Adaptive dictionary

LZ77

  • Encoding Process (cont’d)

  • Example (cont’d)

    • A match is found at o = 1, l = 1

    • Another match is found at o = 3, l = 3 if we do not look further into the look-ahead buffer

    • However, if we do look into the look-ahead buffer, we can extend our length to 5

    • This resolves the question regarding the second term Log2 (W) in our bits needed to encode the triple

adabrar | rarrad


Adaptive dictionary

LZ77

  • Encoding Process (cont’d)

  • Example (cont’d)

    • Thus, we encode the triple as <3, 5, C(d)>

    • If we were continuing to encode symbols we would again shift the window by 6 symbols

  • Decoding Process

    • The decoding process is best understood by an example

adabrar | rarrad


Adaptive dictionary

LZ77

  • Decoding Process (cont’d)

  • Example

    • Assume we have already decoded the sequence cabraraca and have received the triples

      • (1) <0, 0, C(d)>

      • (2) <7, 4, C(r)>

      • (3) <3, 5, C(d)>

    • Initially start at

      (0)

| cabraraca


Adaptive dictionary

LZ77

  • Decoding Process (cont’d)

  • Example (cont’d)

    (0)

    (1)<0, 0, C(d)>

    (2) <7, 4, C(r)>

    (3) <3, 5, C(d)>

| cabraca

c | abraca d

cabrac| ad abra r

cabracadabra | r rarra d


Adaptive dictionary

LZ77

  • Decoding Process (cont’d)

  • Example (cont’d)

    (2) <7, 4, C(r)>


Adaptive dictionary

LZ77

  • Decoding Process (cont’d)

  • Example (cont’d)

    (3) <3, 5, C(d)>


Lz77 summary

LZ77 - SUMMARY

  • In General

    • The algorithm is a simple adaptive scheme that requires no prior knowledge of the source and seems to require no assumptions

    • Lemple and Ziv showed that asymptotically the performance of this algorithm approaches the best that could be obtained by using a scheme that had full knowledge about the statistics of the source

    • This may be true asymptotically, however in practice there are ways to improve LZ77

    • There is a “hidden” assumption that patterns recur “close” together. We shall see that this assumption is removed in LZ78


Lz77 summary1

LZ77 - SUMMARY

  • Variations

    • Efficient encoding of triples

      • With added complexity we could drop the assumption that the triples are fixed length

      • PKzip, Zip, LHarc, PNG, gzip, ARJ all use LZ77 with variable-length encoder

    • Varying the size of the search and look-ahead buffers

      • Increasing the size of the search buffer will require more effective search strategies

      • Such strategies can be implanted more effectively if the contents of the search buffer are stored in a manner conducive to fast searches


Lz77 summary2

LZ77 - SUMMARY

  • Variations

    • Eliminate encoding data in a triple

      • This can be done using a flag bit

      • Implementing the flag bit removes the necessity of the triple. Now the data can be encoded as either the single symbol codeword or a pair representing the match. For example

        • Flag = 1 → single symbol codeword

        • Flag = 0 → pair < o, l > representing the match length

      • This is referred to as LZSS


Adaptive dictionary

LZ78

  • Updates to LZ77

    • The assumptions from LZ77 that patterns will occur close together was dropped

    • Makes use of recent past sequence as dictionary for encoding

      • However, this means that any pattern that recurs over a period longer than that covered by the coder window will not be captured


Adaptive dictionary

LZ78

  • Updates to LZ77

    • It can be seen that if the search window was one symbol longer. Thus, each symbol will be encoded as a single symbol

    • LZSS – additional 1-bit overhead

    • LZ77 – triple encoded for a single symbol

    • Thus, the effect of this problem actually causes an expansion instead of a compression


Adaptive dictionary

LZ78

  • Solution to this problem

    • LZ78 drops the search buffer for a dictionary

      • Note: care must be taken to identically build the dictionary by both the encoder and decoder

    • Now, the date is encoded in a double (or pair)

      • < i, c >

      • i - the index of the symbol in the dictionary

      • c – codeword for the character following the matched portion of the input


Adaptive dictionary

LZ78

  • Example

    • Let us encode the following word

    • The character b with a slash represents a space


Adaptive dictionary

LZ78

  • Example (cont’d)


Adaptive dictionary

LZ78

  • Example (cont’d)

    • Problems

      • The dictionary grows indefinitely

      • To resolve this problem there are two options

        • Pruning

          • However, added complexity is required in order to keep track of the most frequently used dictionary elements

        • Goes to a static dictionary

          • This limits the performance of the algorithm


Adaptive dictionary

LZW

  • Variation of LZ78

    • Terry Welch proposed a method for removing the necessity of encoding the pair < i, c > and only encoding the index

    • The dictionary must be primed with the source alphabet

    • This variation is know as LZW


Adaptive dictionary

LZW

  • Encoding

  • Example


Adaptive dictionary

LZW

  • Encoding (cont’d)

  • Example (cont’d)


Adaptive dictionary

LZW

  • Encoding (cont’d)

  • Example (cont’d)


Adaptive dictionary

LZW

  • Decoding

  • Example

    • Encoder output sequence (prev. example) was 5 2 3 3 2 1 6 8 10 12 9 11 7 16 5 4 4 11 21 23 4


Adaptive dictionary

LZW

  • LZW Problem

    • The algorithm breaks down in one particular case

    • Let A = {a, b}

    • Let the sequence to be encoded be

      ababababababab … .. .


Adaptive dictionary

LZW

  • LZW Problem


Adaptive dictionary

LZW

  • LZW Problem

    • The encoded sequence is 1 2 3 5 … .. .

    • At this point everything is fine


Adaptive dictionary

LZW

  • LZW Problem

    • The encoded sequence is 1 2 3 5 … .. .

    • But wait! We do not have 5 in our dictionary


Adaptive dictionary

LZW

  • LZW Problem

    • The encoded sequence is 1 2 3 5 … .. .

    • We do have the beginning of the 5th entry, ab…


Adaptive dictionary

LZW

  • LZW Problem

    • We can now decode the last letter ‘a’ and continue on without further trouble


Adaptive dictionary

LZW

  • LZW Problem

    • Thus, the decoder must have an exception handler for this type of case


References

References

  • K. Saywood, Introduction to Data Compression 2nd Ed., Morgan Kaufmann Publishers, 2000


  • Login