1 / 21

Sampling-based Program Locality Approximation

Sampling-based Program Locality Approximation. Yutao Zhong, Wentao Chang Department of Computer Science George Mason University June 8th,2008. Outline. Background information Motivation Our sampling approach Experimental results. 2. Starting Point. Ending Point. 2.

aurora
Download Presentation

Sampling-based Program Locality Approximation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sampling-based Program Locality Approximation Yutao Zhong, Wentao Chang Department of Computer Science George Mason University June 8th,2008

  2. Outline • Background information • Motivation • Our sampling approach • Experimental results

  3. 2 Starting Point Ending Point 2 Reuse distance and reuse signature • Reuse distance: the number of distinct data elements accessed between two consecutive uses of the same element • Reuse signature: a histogram of reuse distances demonstrating the distribution of reuse distances over different lengths a b c a a c b

  4. Reuse signature application • Relationship to cache behavior : • Capacity miss • <= reuse distance ≥ cache size • Reduce reuse distance • => improve cache effectiveness • Current applications : • Predict cache miss rate [Zhong+03][Marin & Mellor-Crummey 04] [Fang+05][Zhong+07] • Reorganize data [Zhong+04] • Provide caching hint [Beyls & D’Hollander 02] • Evaluate program optimizations [Beyls & D’Hollander 01] [Ding 00]

  5. Large space and a long counting time required to store traces and count memory access a ca b b a Starting Point Ending Point Access Time Table Access Trace Distance Histogram 1 Address Search Update Last Search, Count Update Distance Record distance • Enormous efforts for memory-intensive program Reuse distance measurement Data Structure: Get Accessed Memory Address

  6. Motivation • Sampling is generally effective to reduce the overhead of program behavior profiling • We are devoted to balance efficiency and accuracy • Sample only 1% memory accesses • Improve measurement speed by 7.5 times in average • Achieve over 99% accuracy

  7. Sampling algorithms • Utilize common structure of bursty tracing [Hirzel & Chilimbi 01] • Sampling rate r =|Is|/(|Is| +|IH|) • Naïve sampling • Turn off profiling during hibernating intervals • Non guarantee of accuracy

  8. Inaccurate measurement 1 ⑤ 3 Naive sampling Naïve sampling: Memory access trace: ① ② ③ ④ . . c a b c a c a b c a c a b c d a . . . . IH IS IH IS

  9. Biased sampling • Probability of being sampled not uniform • Ignore datum that has been referenced within the current hibernating period • Measured distance always larger than or equal to actual distance • Probability of being sampled not uniform

  10. Biased sampling Biased sampling: Memory access trace: ⑤ ① ② ③ ④ . . c a b c a f a b c a c a b f d a . . . . IH IS IH IS

  11. History-preserved representative sampling • Add an additional tag for each address in access trace • Mark references within a sampling period as sampled in the tag • Reuse will only be sampled when starting point marked sampled

  12. History-preserved representative sampling History-preserved representative sampling: Memory access trace: ⑤ ① ② ③ ④ . . c a b c a f a b c a c a b f d a . . . . IH IS IH IS

  13. Further improvements • Simplifying maintenance in hibernating intervals • Reference trace implementation: splay tree [Ding & Zhong 03] • In sampling period, full tree maintenance • In hibernating period, instead of a new leaf node for each access, we construct a single node for each hibernating period with a counter of the number of distinct accesses • Fast sample tag marking and checking • To save space cost, we fix the length of sampling and hibernating period, avoid additional tag

  14. Experiments • Benchmarks from SPEC 2006, Olden, Chaos: • Floating point programs: CactusADM, Milc, Soplex, Apsi, MolDyn • Integer programs: Bzip2,Gcc, Libquatum, Perimeter, TSP • Instrumentation tool: Valgrind 3.2.3 • Sampling rate : 1% • We run each individual benchmark with 3 to 6 different inputs • Repeat three time for each input

  15. Experiments cont’d • Comparison of accuracy and efficiency • Ding and Zhong ’s approximation method [Ding & Zhong 03] • Time distance measurement [Shen+07] • Implementation of four algorithms: • Naive sampling, biased sampling, basic and optimized representative sampling

  16. Accuracy

  17. Efficiency • Sampling even outperforms the lower bound :time distance measurement • Generally, speedup is less when the input size is small

  18. Efficiency • Speedup of basic representative sampling : around 4-5 times for most cases • Speedup of optimized representative sampling: • around 7-10 for most cases, up to 33 times • geometric mean is 7.5 • Sampling rate effect (TSP):

  19. Related work • Reuse signature collection • [Mattson+70] [Bennett & Kruskal 75] [Olken81] [Kim+91] [Sugumar & Abraham 93] [Almasi+02] [Ding & Zhong 03] [Shen+07] • Selective monitoring • Time sampling [Zagha+96] [Anderson+97] [Burrows+00][Whaley 00] [Arnold & Sweeney 00] [Arnold & Ryder 01] [Hirzel & Chilimbi 01] [Chilimbi & Hirzel 02] [Itzkowitz+03] [Arnold & Grove 05] • Data sampling [Larus 90] [Ding & Zhong 02] [Zhao+07] • Uses of efficient locality analysis [Huang & Shen 96] [Li+96] [Ding 2000] [Beyls & D’ Hollander 01] [Almasi+02] [Beyls & D’ Hollander 02] [Zhong+04] [Marin & Mellor-Crummey 04] [Fang+05] [Zhong+07]

  20. Future work • Dynamically adjust sampling/hibernating lengths • Store references in temporary buffer and then process them in batch • Combine time sampling with data sampling

  21. Thank you! • Questions?

More Related