1 / 14

gFPC: A Self-Tuning Compression Algorithm

gFPC: A Self-Tuning Compression Algorithm. Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Kasetsart University. Introduction. Many compression algorithms are parameterizable Some parameters allow straightforward trade-offs

ulmer
Download Presentation

gFPC: A Self-Tuning Compression Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. gFPC: A Self-Tuning Compression Algorithm Martin Burtscher1 and Paruj Ratanaworabhan2 1The University of Texas at Austin 2Kasetsart University

  2. Introduction • Many compression algorithms are parameterizable • Some parameters allow straightforward trade-offs • E.g., compression ratio vs. speed • Controlled via command line • Other parameters provide no obvious trade-off • Best value is input dependent and changes dynamically • E.g., hash function in a predictor • Typically hardcoded 2

  3. Contribution • Self-tuning approach to optimize parameters • Automatic, on-line, and genetic-algorithm-based • Slower compression but higher compression ratio • gFPC algorithm for IEEE 754 double-precision data • Compresses linear streams of FP values • Lossless single-pass algorithm • Repeatedly self-tunes 4 hash-table parameters 3

  4. FPC Algorithm [DCC’07] • Make two predictions • Select closer value • XOR with true value • Count leading zero bytes • Encode value • Update predictors 4

  5. Hash Function Parameters • Two predictors • FCM predicts values, DFCM predicts differences fcm_prediction = fcm[fcm_hash]; // prediction: read hash table entry fcm[fcm_hash] = true_value; // update: write hash table entry fcm_hash = ((fcm_hash << lshift) ^ (true_value >> rshift)) & (table_size–1); • Two parameters each • lshiftfor aging • rshiftfor eliminating random bits • 802,816 possibilities with 256 kBtable_size 5

  6. Genetic Self-Tuning • Compress blocks with several sets of parameters • Start with FPC and otherwise random sets • Create new sets for next data block • Keep best set of parameters • Evolve remaining sets 6

  7. Related Work • Genetic algorithms (GAs) for evolving programs • Program output approximates original data • GAs for evolving compressor parameters off-line • Rate distortion • Vector quantization • Fractal codes • Dictionary n-grams • Best compressor for each block • We use on-line GA: faster, adapts dynamically 7

  8. Evaluation Method • System • Sun Fire X2270 Server, Ubuntu Linux 8.06 • 2.93 GHz 64-bit Intel Xeon 5570 (Nehalem) processor • Datasets • Linear streams of real-world data (18 – 277 MB) • 4 observations: error, info, spitzer, temp • 4simulations: brain, comet, control, plasma • 5 MPI messages: bt, lu, sp, sppm, sweep3d 8

  9. Population Size • Affects • Compression speed • Compression ratio • Result • Population size of 4 performs within .5% of maximum • (P. size = 1 → FPC) 9

  10. Block Size • Affects • Reconfiguration frequency • Compression ratio • Result • 512 kB blocks good • Medium sizes best • Warm-up versus adaptivity tradeoff 10

  11. Compression Ratio Comparison • FPCsize and FPCall • Use off-line GA an LS to find best parameters for each size (and input) • Results • FPC is 5% worse • FPCsize no input adaptivity • FPCall (mostly) better • gFPC is retroactive (but can adapt on-the-fly) • gFPC is 317 times faster 11

  12. Self-Tuning Benefit • Rarely worse, mostly better (up to 72%) • Relative to FPC, which was tuned for these inputs • Benefit is likely higher on other inputs 12

  13. Throughput on Xeon System Compression is slower with larger population size Small compression overhead due to self tuning Decompression is faster due to better compression 13

  14. Summary • Self-tuning approach • Based on on-line genetic algorithm • Repeatedly tunes 4 hash-table parameters in gFPC • Applicable to other compressors • Results • Higher compression ratio, lower compression speed • gFPC compresses at 1 Gb/s, decompresses at 7 Gb/s • C source code of gFPC is freely available http://users.ices.utexas.edu/~burtscher/research/gFPC/ 14

More Related