Sampling-based Program Locality Approximation

Sampling-based Program Locality Approximation Yutao Zhong, Wentao Chang Department of Computer Science George Mason University June 8th,2008

Outline • Background information • Motivation • Our sampling approach • Experimental results

2 Starting Point Ending Point 2 Reuse distance and reuse signature • Reuse distance: the number of distinct data elements accessed between two consecutive uses of the same element • Reuse signature: a histogram of reuse distances demonstrating the distribution of reuse distances over different lengths a b c a a c b

Reuse signature application • Relationship to cache behavior : • Capacity miss • <= reuse distance ≥ cache size • Reduce reuse distance • => improve cache effectiveness • Current applications : • Predict cache miss rate [Zhong+03][Marin & Mellor-Crummey 04] [Fang+05][Zhong+07] • Reorganize data [Zhong+04] • Provide caching hint [Beyls & D’Hollander 02] • Evaluate program optimizations [Beyls & D’Hollander 01] [Ding 00]

Large space and a long counting time required to store traces and count memory access a ca b b a Starting Point Ending Point Access Time Table Access Trace Distance Histogram 1 Address Search Update Last Search, Count Update Distance Record distance • Enormous efforts for memory-intensive program Reuse distance measurement Data Structure: Get Accessed Memory Address

Motivation • Sampling is generally effective to reduce the overhead of program behavior profiling • We are devoted to balance efficiency and accuracy • Sample only 1% memory accesses • Improve measurement speed by 7.5 times in average • Achieve over 99% accuracy

Sampling algorithms • Utilize common structure of bursty tracing [Hirzel & Chilimbi 01] • Sampling rate r =|Is|/(|Is| +|IH|) • Naïve sampling • Turn off profiling during hibernating intervals • Non guarantee of accuracy

Inaccurate measurement 1 ⑤ 3 Naive sampling Naïve sampling: Memory access trace: ① ② ③ ④ . . c a b c a c a b c a c a b c d a . . . . IH IS IH IS

Biased sampling • Probability of being sampled not uniform • Ignore datum that has been referenced within the current hibernating period • Measured distance always larger than or equal to actual distance • Probability of being sampled not uniform

Biased sampling Biased sampling: Memory access trace: ⑤ ① ② ③ ④ . . c a b c a f a b c a c a b f d a . . . . IH IS IH IS

History-preserved representative sampling • Add an additional tag for each address in access trace • Mark references within a sampling period as sampled in the tag • Reuse will only be sampled when starting point marked sampled

History-preserved representative sampling History-preserved representative sampling: Memory access trace: ⑤ ① ② ③ ④ . . c a b c a f a b c a c a b f d a . . . . IH IS IH IS

Further improvements • Simplifying maintenance in hibernating intervals • Reference trace implementation: splay tree [Ding & Zhong 03] • In sampling period, full tree maintenance • In hibernating period, instead of a new leaf node for each access, we construct a single node for each hibernating period with a counter of the number of distinct accesses • Fast sample tag marking and checking • To save space cost, we fix the length of sampling and hibernating period, avoid additional tag

Experiments • Benchmarks from SPEC 2006, Olden, Chaos: • Floating point programs: CactusADM, Milc, Soplex, Apsi, MolDyn • Integer programs: Bzip2,Gcc, Libquatum, Perimeter, TSP • Instrumentation tool: Valgrind 3.2.3 • Sampling rate : 1% • We run each individual benchmark with 3 to 6 different inputs • Repeat three time for each input

Experiments cont’d • Comparison of accuracy and efficiency • Ding and Zhong ’s approximation method [Ding & Zhong 03] • Time distance measurement [Shen+07] • Implementation of four algorithms: • Naive sampling, biased sampling, basic and optimized representative sampling

Accuracy

Efficiency • Sampling even outperforms the lower bound :time distance measurement • Generally, speedup is less when the input size is small

Efficiency • Speedup of basic representative sampling : around 4-5 times for most cases • Speedup of optimized representative sampling: • around 7-10 for most cases, up to 33 times • geometric mean is 7.5 • Sampling rate effect (TSP):

Related work • Reuse signature collection • [Mattson+70] [Bennett & Kruskal 75] [Olken81] [Kim+91] [Sugumar & Abraham 93] [Almasi+02] [Ding & Zhong 03] [Shen+07] • Selective monitoring • Time sampling [Zagha+96] [Anderson+97] [Burrows+00][Whaley 00] [Arnold & Sweeney 00] [Arnold & Ryder 01] [Hirzel & Chilimbi 01] [Chilimbi & Hirzel 02] [Itzkowitz+03] [Arnold & Grove 05] • Data sampling [Larus 90] [Ding & Zhong 02] [Zhao+07] • Uses of efficient locality analysis [Huang & Shen 96] [Li+96] [Ding 2000] [Beyls & D’ Hollander 01] [Almasi+02] [Beyls & D’ Hollander 02] [Zhong+04] [Marin & Mellor-Crummey 04] [Fang+05] [Zhong+07]

Future work • Dynamically adjust sampling/hibernating lengths • Store references in temporary buffer and then process them in batch • Combine time sampling with data sampling

Thank you! • Questions?

Sampling-based Program Locality Approximation

Sampling-based Program Locality Approximation

Presentation Transcript

Locality Based Working for School Based ASTs

Knowledge-Based Kernel Approximation

A Component-based Definition of Spatial Locality

Linear Programming-Based Approximation Algorithms

Semidefinite Programming Based Approximation Algorithms

VLSH: Voronoi-based Locality Sensitive Hashing

Sampling-Based Planners

LP- Based Approximation

Boosted Sampling: Approximation Algorithms for Stochastic Problems

Boosted Sampling: Approximation Algorithms for Stochastic Problems

Sample Approximation Methods for Stochastic Program

Garbage Collection Advantage: Improving Program Locality

nGMS and Locality Based Service Redesign

GC Advantage: Improving Program Locality

Reconnect ‘04 LP-Based Approximation Algorithms

Sampling Program Measurement Options

Sampling-based Approximation Algorithms for Multi-stage Stochastic Optimization

Locality

Sampling-based Approximation Algorithms for Multi-stage Stochastic Optimization

Sampling Program Measurement Options

Locality

Approximation and Load Shedding Sampling Methods