1 / 29

Lecture 4: Data Compression Techniques

TSBK01 Image Coding and Data Compression. Lecture 4: Data Compression Techniques. Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI). Outline. Huffman coding Arithmetic coding Application: JBIG Universal coding LZ-coding LZ77, LZ78, LZW

noe
Download Presentation

Lecture 4: Data Compression Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TSBK01 Image Coding and Data Compression Lecture 4:Data Compression Techniques Jörgen AhlbergDiv. of Sensor TechnologySwedish Defence Research Agency (FOI)

  2. Outline • Huffman coding • Arithmetic coding • Application: JBIG • Universal coding • LZ-coding • LZ77, LZ78, LZW • Applications: GIF and PNG

  3. Repetition • Coding: Assigning binary codewords to (blocks of) source symbols. • Variable-length codes (VLC) and fixed-length codes. • Instantaneous codes ½ Uniqely decodable codes ½ Non-singular codes ½ All codes • Tree codes are instantaneous. • Tree code , Kraft’s Inequality.

  4. Creating a Code: The Data Compression Problem • Assume a source with an alphabet A and known symbol probabilities {pi}. • Goal: Chose the codeword lengths as to minimize the bitrate, i.e., the average number of bits per symbol li¢pi. • Trivial solution:li = 0 8i. • Restriction: We want an instantaneous code, so 2-li· 1 (KI) must be valid. • Solution (at least in theory):li = –log pi

  5. In practice… • Use some nice algorithm to find the code tree • Huffman coding • Tunnstall coding

  6. a 0.5 b 0.25 c 0.125 0 0 0 d 0.125 10 1 10 11 110 111 Huffman Coding • Two-step algorithm: • Iterate: • Merge the least probable symbols. • Sort. • Assign bits. 0.5 0.5 Merge 0.25 0.5 Sort Assign 0.25 Get code

  7. Coding of the BMS • Trick: Code blocks of symbols (extended source). • Example:p1 = ¼ , p2 = ¾. • Applying the Huffman algorithm directly:1 bit/symbol.

  8. Huffman Coding: Pros and Cons + Fast implementations. + Error resilient: resynchronizes in ~ l2 steps. • The code tree grows exponentially when the source is extended. • The symbol probabilities are built-in in the code. Hard to use Huffman coding for extended sources / large alphabets or when the symbol probabilities are varying by time.

  9. Arithmetic Coding • Shannon-Fano-Elias • Basic idea: Split the interval [0,1] according to the symbol probabilities. • Example:A = {a,b,c,d}, P = {½, ¼, 1/8, 1/8}.

  10. Interval Decoder Bit 1 1 0 1 1 0.90624 0.875 0.875 0.75 0.5 - - - - - 0.9375 1 1 1 0.9375 c a c 0 0.5 1 b c a b c 0.9 a b c 0.9 0.96 0.98 1 0.2 Start in b.Code the sequence c c a. c Code the sequence c c a.) Code the interval [0.9, 0.96] 0.5 b 0.2 0.6 0.5 a 0.8 0.2

  11. 0 1 0 0 1 X An Image Coding Application • Consider the image content in a local environment of a pixel as a state in a Markov model. • Example (binary image): • Such an environment is called a context. • A probability distribution for X can be estimated for each state. Then arithmetic coding is used. • This is the basic idea behind the JBIG algorithm for binary images and data.

  12. Flushing the Coder • The coding process is ended (restarted) and the coder flushed • after a given number of symbols (FIVO) or • When the interval is too small for a fixed number of output bits (VIFO).

  13. Universal Coding • A universal coder doesn’t need to know the statistics in advance. Instead, estimate from data. • Forward estimation: Estimate statistics in a first pass and transmit to the decoder. • Backward estimation: Estimate from already transmitted (received) symbols.

  14. Statisticsestimation Arithmeticcoder Universal Coding: Examples • An adaptive arithmetic coder • An adaptive dictionary technique • The LZ coders [Sayood 5] • An adaptive Huffman coder [Sayood 3.4]

  15. Ziv-Lempel Coding (ZL or LZ) • Named after J. Ziv and A. Lempel (1977). • Adaptive dictionary technique. • Store previously coded symbols in a buffer. • Search for the current sequence of symbols to code. • If found, transmit buffer offset and length.

  16. a b c a b d d a c a b c e e e f 2 3 If the size of the search buffer is N and the size of the alphabet is Mwe need bits to code a triplet. LZ77 Search buffer Look-ahead buffer 8 7 6 5 4 3 2 1 Output triplet <offset, length, next> 8 3 d 0 0 e 1 2 f Transmitted to decoder: PKZip, Zip, Lharc,PNG, gzip, ARJ Variation: Use a VLC to code the triplets!

  17. Drawback with LZ77 • Repetetive patterns with a period longer than the search buffer size are not found. • If the search buffer size is 4, the sequencea b c d e a b c d e a b c d e a b c d e …will be expanded, not compressed.

  18. LZ78 • Store patterns in a dictionary • Transmit a tuple <dictionary index, next>

  19. a b c a b a b c a 1 b 2 c 3 a b 4 a b c 5 LZ78 Output tuple <dictionary index, next> 0 a 0 b 0 c 1 b 4 c Transmitted to decoder: Decoded: a c a b c b a b Dictionary: Strategy needed for limiting dictionary size!

  20. LZW • Modification to LZ78 by Terry Welch, 1984. • Applications: GIF, v42bis • Patented by UniSys Corp. • Transmit only the dictionary index. • The alphabet is stored in the dictionary in advance.

  21. a b c a b a b c a a bc bc 6 1 1 6 b ca ca b 2 7 7 2 c aba aba c 3 3 8 8 d abc d 4 4 9 a b a b 5 5 LZW Input sequence: Output: dictionary index Transmitted: Decoded: a b c a b a b 1 2 3 5 5 Decoder dictionary: Encoder dictionary:

  22. And now for some applications:GIF & PNG

  23. GIF • CompuServe Graphics Interchange Format (1987, 89). • Features: • Designed for up/downloading images to/from BBSes via PSTN. • 1-, 4-, or 8-bit colour palettes. • Interlace for progressive decoding (four passes, starts with every 8th row). • Transparent colour for non-rectangular images. • Supports multiple images in one file (”animated GIFs”).

  24. GIF: Method • Compression by LZW. • Dictionary size 2b+1 8-bit symbols • b is the number of bits in the palette. • Dictionary size doubled if filled (max 4096). • Works well on computer generated images.

  25. GIF: Problems • Unsuitable for natural images (photos): • Maximum 256 colors () bad quality). • Repetetive patterns uncommon () bad compression). • LZW patented by UniSys Corp. • Alternative: PNG

  26. PNG: Portable Network Graphics • Designed to replace GIF. • Some features: • Indexed or true-colour images (· 16 bits per plane). • Alpha channel. • Gamma information. • Error detection. • No support for multiple images in one file. • Use MNG for that. • Method: • Compression by LZ77 using a 32KB search buffer. • The LZ77 triplets are Huffman coded. • More information: www.w3.org/TR/REC-png.html

  27. Summary • Huffman coding • Simple, easy, fast • Complexity grows exponentially with the block length • Statistics built-in in the code • Arithmetic coding • Complexity grows linearly with the block size • Easily adapted to variable statistics ) used for coding of Markov sources • Universal coding • Adaptive Huffman or arithmetic coder • LZ77: Buffer with previously sent sequences <offset,length,next> • LZ78: Dictionary instead of buffer <index,next> • LZW: Modification to LZ78 <index>

  28. Summary, cont • Where are the algorithms used? • Huffman coding: JPEG, MPEG, PNG, … • Arithmetic coding: JPEG, JBIG, MPEG-4, … • LZ77: PNG, PKZip, Zip, gzip, … • LZW: compress, GIF, v42bis, …

  29. Finally • These methods work best if the source alphabet is small and the distribution skewed. • Text • Graphics • Analog sources (images, sound) require other methods • complex dependencies • accepted distortion

More Related