slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
M5 research group, University of Central Florida PowerPoint Presentation
Download Presentation
M5 research group, University of Central Florida

Loading in 2 Seconds...

play fullscreen
1 / 22
heloise

M5 research group, University of Central Florida - PowerPoint PPT Presentation

112 Views
Download Presentation
M5 research group, University of Central Florida
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. StarNT: Dictionary-based Fast Transform Weifeng Sun wsun@cs.ucf.edu School of Electrical Engineering and Computer Science University of Central Florida M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 1

  2. Table of Contents • Preprocessing/Postproprossing Model • Star Transform • StarNT Transform • StarZip • Domain Specific Text Compression Tool • Review M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 2

  3. Current Text Compression Model • First-order Entropy Coder • Huffman (word, canonical) • Arithmetic: arbitrary precision • Statistical Models • PPM(BWT): prediction by context • DMC • Dictionary Models • LZ-family: good compression, fast M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 3

  4. Preprocessing/Postprocessing Model Preprocessor Compression Algorithm Text File Compressed File Decompression Algorithm Postprocessor M5 research group, University of Central Florida Weifeng Sun 25 April 2003 4

  5. Goal of Preprocessor • Accelerate the backend compressing algorithm • The shorter, the faster • Backend compressor oriented • More “delicious” input • Preserve some original context • Provide some “artificial” context • Universal • Text transform M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 5

  6. StarNT: Transform paradigm Transform Encoding Compression Algorithm Text File Transformed File Dictionary Compressed File Transform Encoding Decompression Algorithm M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 6

  7. Table of Contents • Preprocessing/Postproprossing Model • Star Transform • StarNT Transform • StarZip • Domain Specific Text Compression Tool • Review M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 7

  8. Example: Star-encoding Transform dictionary Input text This is a long example to demonstrate the “substitution” method. a * is ** to *a the *** long **** this ***a test ***b method ****** example ******* demonstrate *********** ***a^ ** * **** ******* *a *********** *** “substitution” ******. 100111001100000101011010011100 Lots of compression gain! M5 research group, University of Central Florida Weifeng Sun 25 April 2003 8

  9. Example: LIPT-transform Transform dictionary Input text This is a long example to demonstrate the “substitution” method. a *a is *bq to *be the *cd long *dfa this *dr test *dB method *fb example *gY demonstrate *key *dr^ *bq *a *dfa *gY *be *key *cd “substitution” *fb. 1001110011000001010110 MORE gain! M5 research group, University of Central Florida Weifeng Sun 25 April 2003 9

  10. Table of Contents • Preprocessing/Postproprossing Model • Star Transform • StarNT Transform • StarZip • Domain Specific Text Compression Tool • Review M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 10

  11. StarNT Transform • Fast Transform Encoding/Decoding • Ternary search tree • Fast Backend Compression/Decompression • Shorter transform output • Higher Compression Ratio • More efficient transform • StarZip: Multi-corpus Compression Tool M5 research group, University of Central Florida Weifeng Sun 25 April 2003 11

  12. Example: Ternary Search Tree • Hash table • Binary tree • Digital search tries • Ternary search trees Searching for a string of lengthk in a ternary search tree with nstrings will require at most O(log n+k) CHAR comparisons M5 research group, University of Central Florida Weifeng Sun 25 April 2003 12

  13. StarNT: Efficient Transform • Maintain some original context, provide new “artificial” context • Preserve word frequency information • Use word length information • Index encoding • Codeword denotes the index of the word in the dictionary • Lightning transform decoding. M5 research group, University of Central Florida Weifeng Sun 25 April 2003 13

  14. StarNT: Fast Backend Compression/Decompression • Shorter transform immediate file • The meaning of symbol ‘*’ changed! M5 research group, University of Central Florida Weifeng Sun 25 April 2003 14

  15. StarNT: Compression Performance Bzip2 –9 + StarNT Gzip –9 + StarNT PPMD (k=5) + StarNT 11.2% 16.4% 10.2% • StarNT is better than LIPT • bzip2+StarNT is better than PPMD • in time complexity • compression performance. M5 research group, University of Central Florida Weifeng Sun 25 April 2003 15

  16. StarNT: Timing Performance -- Compared with LIPT • Encoding • Decoding 76.3% 84.9% M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 16

  17. StarNT: Timing Performance -- Compared with Backend Compressor Encoding Bzip2 -9 Gzip -9 PPMD (k=5) 28.1% 50.4% 21.2% Decoding 18.6% Some Increase neglectable M5 research group, University of Central Florida Weifeng Sun 25 April 2003 17

  18. Table of Contents • Preprocessing/Postproprossing Model • Star Transform • StarNT Transform • StarZip • Domain Specific Text Compression Tool • Review M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 18

  19. StarZip: Domain Specific Dictionary • Five corpora used (from ibiblio.com) M5 research group, University of Central Florida Weifeng Sun 25 April 2003 19

  20. StarZip: Preliminary Result -- Compression Performance Bzip2 –9 + StarZip Gzip –9 + StarZip PPMD (k=5) + StarZip 13% 19% 10% M5 research group, University of Central Florida Weifeng Sun 25 April 2003 20

  21. Table of Contents • Preprocessing/Postproprossing Model • Star Transform • StarNT Transform • StarZip • Domain Specific Text Compression Tool • Review M5 research group, University of Central Florida 25 April 2003 Weifeng Sun 21

  22. Review: Philosopy of Preprocessing /Postprocessing • Transfom th txt into som intermdiate form whic can b compresed with betr eficency. • Xploit th natral redndancy of the laguage in makng this tranformaton. M5 research group, University of Central Florida Weifeng Sun 25 April 2003 22