1 / 26

Computer Science 335

Compression .. Magic or Science. Only works when a MORE EFFICIENT means of encoding can be foundSpecial assumptions must be made about the data in many cases in order to gain compression benefitsCompression" can lead to larger files if the data does not conform to assumptions. Why compress?. In f

nizana
Download Presentation

Computer Science 335

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Computer Science 335 Data Compression

    2. Compression .. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found Special assumptions must be made about the data in many cases in order to gain compression benefits “Compression” can lead to larger files if the data does not conform to assumptions

    3. Why compress? In files on a disk save disk space In internet access reduce wait time In a general queueing system keep paybacks can be more than linear if operation is nearing or in saturation

    4. A typical queueing graph

    5. Example Ascii characters require 7 bits Data may not use ALL characters in Ascii set consider just digits 0..9 Only 10 values -> really only requires 4 bits There is actually a well used code for this which also allows for +/- -> BCD

    6. Other Approaches

    7. Run length encoding Preface each run with a 8-bit length byte aaabbabbbccdddaaaa -> 18 bytes 3a2b1a3b2c3d4a -> 14 bytes benefit from runs of 3 or more aaa versus 3a No gain or loss aa versus 2a lose in single characters a versus 1a

    8. Facsimile Compression (example of run-length encoding) Example of application of run-length encoding. Decomposed into black/white pixels Lots of long runs of black and white pixels Don’t encode each pixel but runs of pixels

    9. Differential encoding values between 1000 and 1050 1050 requires 11 bits difference plus +/- requires 7 bits 6 bits -> 64 1 additional bit for direction (+/-) Differential encoding can lead to problems as each value is relative to the last value. Like directions, one wrong turn and everything else is irrelevant.

    10. Frequency Based Encoding Huffman Encoding is not the same length for all values Short codes for frequently occurring symbols Longer codes for infrequently occurring Arithmetic (not responsible for this) Interpret a string as a real number Infinite number of values between 0 and 1 Divide region up based on frequency A ->12% and B 5%, A is 0 to 0.12 and B 0.12 to 0.17 Limit based on the fact that computer has limited precision

    11. Huffman (more details)

    12. Huffman encoding Must know distribution of symbols Symbols typically have DIFFERENT lengths unlike most schemes you have seen (Ascii, etc) Characters occurring most have shortest code Characters occurring least have longest Solution minimal but not unique

    13. Assume following data

    14. Lets peek at the answer

    15. Build the solution tree Choose the smallest two at a time and group

    16. And the binary encoding..

    17. Compute expected length

    18. Is it hard to interpret a message?

    19. Observations of Huffman Method creates a shorter code Assumes knowledge of symbol distribution Different symbols .. Different length Knowing distribution ahead of time is not always possible! Another version of Huffman coding can solve that problem

    20. Revisiting Facsimiles Huffman says one can minimize by assigning different length codes to symbols Fax transmissions can use this principle to give short messages to long runs of white/black pixels/ Run-length combined with Huffman See Table 5.7 in the text

    21. Table 5.7

    22. Multimedia compression

    23. Image compression Represented as RGB 8 bits typical for each color Or as Luminance (brightness 8 bits) and Chrominance (color 16 bits) Perception of color by humans reacts significantly to light in addition to color Really two ways to represent the same thing

    24. JPEG

    25. JPEG algorithm

    26. MPEG Uses differential encoding to compare successive frames of a motion picture. Three kinds of frames: I -> JPEG complete image P -> incremental change to I (where block moves) ˝ size I B -> use a different interpolation technique Ľ size I Typical sequence -> I B B P B B I ….

    27. MP3 Music/audio compression Uses psychoacoustic principles Some sounds can’t be heard because they are drowned by other louder sounds (freqs) Divide the sound into smaller subbands Eliminate sounds you can’t hear anyway because others are too loud. 3 types with varying compression Layer 1 4:1 192K Layer 2 8:1 128K Layer 3 12:1 64K

More Related